Hi, I am trying to extract a paragraph with line breaks. Storetext doesnt work as it extracts a flattened version. How do I resolve this?
The below macro extracts the complete HTML code of a website. Does that solve the issue?
{
"Name": "ParseHTML",
"CreationDate": "2025-4-6",
"Commands": [
{
"Command": "open",
"Target": "https://forum.ui.vision/",
"Value": "",
"Description": ""
},
{
"Command": "executeScript",
"Target": "var str = document.body.innerHTML; // Get page source\n\n//Next: Clean up HTML source before further processing \n\n//First remove scripts and style tags with their content\nstr = str.replace(/<script\\b[^<]*(?:(?!<\\/script>)<[^<]*)*<\\/script>/gi, '');\nstr = str.replace(/<style\\b[^<]*(?:(?!<\\/style>)<[^<]*)*<\\/style>/gi, '');\n \n//Then remove all remaining tags but keep their content\nstr = str.replace(/<[^>]+>/g, '');\n \n//Clean up whitespace\nstr = str.replace(/\\s+/g, ' ').trim();\n \nreturn str;",
"Value": "html",
"Description": "Extract entire HTML code of website"
},
{
"Command": "echo",
"Target": "Entire HTML extracted (long): ${html}",
"Value": "brown",
"Description": ""
}
]
}
Hi, thank you so much for your reply. I intend to extract just a part of the web page, not the entire one. Here’s the xpath: /html/body/div[1]/div/div[1]/div/div[3]/div/div/div[1]/div[1]/div[2]/div/div/div[2]/div/div/div/div[2]/div[1]/div/div/div/div[7]/div/span