sourceExtract HTML mode

I have been looking at the command SourceExtract
From the documentation and testing is basically looks for the source string between two plain text strings. What I do not get is why not look for the string between the actual source instead of searching for plan text.

Or is this possible and I missed something?

Note I used a picture of the code because the Post editor said I was posting with too many links…

Or is this possible and I missed something?

Yeah, it should work exactly like you want it to work.

For example sourceSearch | Tea $*</li> => finds text between Tea $ and "</li>

Thank you for your answer by why does a search like this <html* turn up no results?

I mean like this <html*</html

Good question => investigating…

1 Like

The issue was that the * notation did not support line breaks. This is fixed with V5.6.5.

sourceExtract |<html*</html> works now.

* can match \n on Chrome/Edge now, but not Firefox. Firefox workaround: Use regular expression explicitly, something like regex=/<html(.|\n)*</html>/g