How to extract HTM

cyril · April 5, 2025, 4:31pm

Hi, I am trying to extract a paragraph with line breaks. Storetext doesnt work as it extracts a flattened version. How do I resolve this?

admin · April 6, 2025, 8:59am

The below macro extracts the complete HTML code of a website. Does that solve the issue?

{
  "Name": "ParseHTML",
  "CreationDate": "2025-4-6",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://forum.ui.vision/",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "executeScript",
      "Target": "var str = document.body.innerHTML; // Get page source\n\n//Next: Clean up HTML source before further processing  \n\n//First remove scripts and style tags with their content\nstr = str.replace(/<script\\b[^<]*(?:(?!<\\/script>)<[^<]*)*<\\/script>/gi, '');\nstr = str.replace(/<style\\b[^<]*(?:(?!<\\/style>)<[^<]*)*<\\/style>/gi, '');\n   \n//Then remove all remaining tags but keep their content\nstr = str.replace(/<[^>]+>/g, '');\n   \n//Clean up whitespace\nstr = str.replace(/\\s+/g, ' ').trim();\n   \nreturn str;",
      "Value": "html",
      "Description": "Extract entire HTML code of website"
    },
    {
      "Command": "echo",
      "Target": "Entire HTML extracted (long): ${html}",
      "Value": "brown",
      "Description": ""
    }
  ]
}

cyril · April 7, 2025, 6:34am

Hi, thank you so much for your reply. I intend to extract just a part of the web page, not the entire one. Here’s the xpath: /html/body/div[1]/div/div[1]/div/div[3]/div/div/div[1]/div[1]/div[2]/div/div/div[2]/div/div/div/div[2]/div[1]/div/div/div/div[7]/div/span