Problem with REGEX scraping

drittaccount · October 31, 2018, 9:39pm

Hey folks,

unfortunately I have problems with scraping.
This is the part of the source:

 `<div class="cta_text">USERNAME jetzt kennenlernen!</div>`

I want to scrape the Username, but my Target

REGEX=[(?<=<div class="cta_text">)(.+?)(?=jetzt kennenlernen!</div>)] does not return USERNAME

Instead it returns jus a <

Can you please give me a hint what I’m doing wrong?

Thanks,
drittaccount

ulrich · October 31, 2018, 11:19pm

When I test this on https://regex101.com/ I get the same result:

PS: My point is that this is not an issue with Kantu’s SourceExtract command, but a general regex question. I hope someone better at regex than me can answer it

drittaccount · October 31, 2018, 11:35pm

Hey Ulrich,

I tried it again with regex101.com but it just claimed the / inside the regex as “unescaped delimiter (which) must be escaped with a backslash ()”.
It’s not important, so I deleted it. Works on regex101.com, but not on Kantu.

Regards,
drittaccount

ulrich · November 1, 2018, 7:10pm

Hmm… I guess the square brackets [… ] are the problem. Without them, it works fine:

{
“Command”: “sourceExtract”,
“Target”: “regex=(?<=<div class=“cta_text”>)(.+?)(?=jetzt kennenlernen!<\/div>)”,
“Value”: “ww”
},

Josseph0416 · February 28, 2019, 11:15am

Hello, I’m trying to get the text inside span tags but when I print that , it returns all line like this :

28 news

And I get only the number 28 .

My command is
{
“Command”: “sourceExtract”,
“Target”: “regex=<span.class=“cat1”>(.+?)<\/span>”,
“Value”: “numAds”
}

How can I get the number 28 only ?

Thank you in advance.