Converting iMacros Script with EXTRACT=HTM

Hi there!

Complete newbie here :sos:

I’m trying to convert this simple iMacros script:

SET !DATASOURCE source-urls.csv
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
TAB T=1
URL GOTO={{!COL1}}
TAG POS=1 TYPE=H1 ATTR=* EXTRACT=TXT
TAG POS=1 TYPE=DIV ATTR=CLASS:product-single__description<SP>rte EXTRACT=HTM
TAG POS=1 TYPE=IMG ATTR=CLASS:photoswipe__image<SP>lazyload EXTRACT=HTM
ADD !EXTRACT {{!URLCURRENT}}
SAVEAS TYPE=EXTRACT FOLDER=* FILE=bloomtown.csv

The first URL to go to is https://bloomtown.co.uk/products/body-bath-oil-the-hedgerow-blackberry-honeysuckle
Then scrape the product title, description, and image HTML, not text.

I’m really strugling to find anything similar to EXTRACT=HTM here Open-Source iMacros Alternative(s) 2023 (ui.vision)

1 Like

Great question! Ui.Vision itself has no direct equivalent of EXTRACT=HTM but you can recreate it with ExecuteScript and the Javascript command document.evaluate. Use it to get the outerHTML of the web element.

The document.evaluate takes an XPath as input. This is the same XPath as e. g. recorded with the Ui.Vision CLICK command! So I simply recorded a click to get the right XPath.

Here is a demo macro:

{
  "Name": "extract-html-of-tag",
  "CreationDate": "2023-8-10",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://ui.vision/rpa/docs/selenium-ide/storeattribute",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "click",
      "Target": "xpath=//*[@id=\"content\"]/div[2]/div[2]/p/span",
      "Value": "",
      "Description": "This command does nothing. I just used a recorded CLICK to get the XPath locator."
    },
    {
      "Command": "executeScript",
      "Target": "return document.evaluate(\"//*[@id='content']/div[2]/div[2]/p/span\", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.outerHTML;",
      "Value": "a",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "a=|${a}|",
      "Value": "green",
      "Description": ""
    }
  ]
}

And here the same, but with your example website:

{
  "Name": "GetOuterHTML",
  "CreationDate": "2023-8-10",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://bloomtown.co.uk/products/body-bath-oil-the-hedgerow-blackberry-honeysuckle",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "executeScript",
      "Target": "return document.evaluate(\"//p[4]\", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.outerHTML;",
      "Value": "a",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Website text WITH html tags=|${a}|",
      "Value": "blue",
      "Description": ""
    }
  ]
}
1 Like

Thanks so much for taking the time to answer. I wish there was an easier way :slight_smile: