Convert web scraping iMacros macro to Ui.Vision code

Tech support just completed this customer macro creation project. The task was to convert an existing iMacros macro to Ui.Vision. In the customer’s own words:

Attached is the IMacro macro I have used for many years. It was inefficient but it worked. I am hoping that with UI.Vision, I can replicate the Imacros macro and then make it more efficient as well. But we can start with that.

*The purpose of this macro is to *

1) go to this website:

Home - Louisiana State Legislature

*2) Select “Bills” in the top menu *

3) Select any session (but let’s pick 2023 Regular session)

4) Go to the bottom and for Search by Instrument Range, pick HB and then click search

Note: With IMacros, I would set the webpage on the HB, SB or other classification before starting the macro. So it was already set up to begin scraping the page. But UI Vision doesn’t seem to let me do that so I have to start at the main legislative page and find my way to the list of bills

5) I have set the page length to display 100 bills at a time.

6) Scrape each bill and append each bill info record to a csvfile. There should be one record per bill in the file.

7) Once the bottom of the page is reached, press the arrow at the bottom to go to the next page. This process should continue until there are no more pages. Depending on how many pages I expected, I would set the loop to that number of pages before starting the run loop in Imacros.

8) Once I finish all the pages for the HB bills, I would then return and Search for all SB bills and so forth. All are added to the same csv file. Once I have finished and processed the csvfile in another application, I would delete that csvfile before starting the process all over again.

In the Imacros macro, I have 100 sets of instructions to accommodate 100 bills on a page. I am sure with UI Vision, this could be accomplished with a do while loop.

iMacros web scraping video:

iMacros macro:

VERSION BUILD=8021970
TAB T=1
TAB CLOSEALLOTHERS
SET !EXTRACT_TEST_POPUP NO
SET !REPLAYSPEED FAST


TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl0_HyperLink1 EXTRACT=HREF
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl0_HyperLink1 EXTRACT=TXT
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl0_LinkAuthor EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl0_LabelKWordAndSTitle EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl0_LabelStatus EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl0_LabelConsidered EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=mytable.csv

TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl1_HyperLink1 EXTRACT=HREF
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl1_HyperLink1 EXTRACT=TXT
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl1_LinkAuthor EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl1_LabelKWordAndSTitle EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl1_LabelStatus EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl1_LabelConsidered EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=mytable.csv


TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl2_HyperLink1 EXTRACT=HREF
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl2_HyperLink1 EXTRACT=TXT
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl2_LinkAuthor EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl2_LabelKWordAndSTitle EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl2_LabelStatus EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl2_LabelConsidered EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=mytable.csv

TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl3_HyperLink1 EXTRACT=HREF
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl3_HyperLink1 EXTRACT=TXT
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl3_LinkAuthor EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl3_LabelKWordAndSTitle EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl3_LabelStatus EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl3_LabelConsidered EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=mytable.csv

(Many more lines like this... )

TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl97_HyperLink1 EXTRACT=HREF
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl97_HyperLink1 EXTRACT=TXT
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl97_LinkAuthor EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl97_LabelKWordAndSTitle EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl97_LabelStatus EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl97_LabelConsidered EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=mytable.csv

TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl98_HyperLink1 EXTRACT=HREF
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl98_HyperLink1 EXTRACT=TXT
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl98_LinkAuthor EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl98_LabelKWordAndSTitle EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl98_LabelStatus EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl98_LabelConsidered EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=mytable.csv

TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl99_HyperLink1 EXTRACT=HREF
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl99_HyperLink1 EXTRACT=TXT
TAG POS=1 TYPE=A ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl99_LinkAuthor EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl99_LabelKWordAndSTitle EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl99_LabelStatus EXTRACT=TXT
TAG POS=1 TYPE=SPAN ATTR=ID:ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl99_LabelConsidered EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=* FILE=mytable.csv


TAG javascript:__doPostBack('ctl00$ctl00$PageBody$PageContent$DataPager1$ctl02$ctl00','');

WAIT SECONDS = 5

Here is the converted macro. Essentially we

  • replaced all TAG… EXTRACT=TXT with storeText,
  • and TAG…EXTRACT=HREF with storeAttributte | ID=…@href, just as mentioned on our iMacros conversion page and web scraping page.
  • In the storeText/storeAttributte xpaths we replace the element numbers with a variable, so instead of _ctrl0, _ctrl1,… we use _ctrl${i}. This way we can loop over it by increasing the value if “i” each loop.
  • We used Do…RepeatIf to replace the iMacros “Loop” button use.

In the customer’s words:

That worked perfectly! I am now back to where I was with the Imacros!!! But better, especially in terms of speed. Much more elegant macro.

I have used that Imacro system for 15 to 20 years! This is an improvement.

Ui.Vision Web Scraping Video:

Macro source code:

{
  "Name": "iMacros2UI.Vision",
  "CreationDate": "2024-2-15",
  "Commands": [
    {
      "Command": "store",
      "Target": "0",
      "Value": "!timeout_wait",
      "Description": "Fast scraping: Just skip if element is missing. No need to wait for it to appear. Page is already fully loaded."
    },
    {
      "Command": "store",
      "Target": "fast",
      "Value": "!replayspeed",
      "Description": ""
    },
    {
      "Command": "store",
      "Target": "1",
      "Value": "page",
      "Description": "Page counter"
    },
    {
      "Command": "do",
      "Target": "",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "store",
      "Target": "0",
      "Value": "i",
      "Description": "data set counter 0....99"
    },
    {
      "Command": "do",
      "Target": "",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "storeAttribute",
      "Target": "id=ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl${i}_HyperLink1@href",
      "Value": "!csvline",
      "Description": "instrument"
    },
    {
      "Command": "storeText",
      "Target": "id=ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl${i}_HyperLink1",
      "Value": "!csvline",
      "Description": "link"
    },
    {
      "Command": "storeText",
      "Target": "id=ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl${i}_LinkAuthor",
      "Value": "!csvline",
      "Description": "author"
    },
    {
      "Command": "storeText",
      "Target": "id=ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl${i}_LabelKWordAndSTitle",
      "Value": "!csvline",
      "Description": "main text"
    },
    {
      "Command": "click",
      "Target": "id=ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl${i}_LabelKWordAndSTitle",
      "Value": "",
      "Description": "This CLICK is NOT needed. We use it just for the video to scroll the page and highlight the scraped content. Remove for production use. Then the scraping runs faster."
    },
    {
      "Command": "storeText",
      "Target": "id=ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl${i}_LabelStatus",
      "Value": "!csvline",
      "Description": "status"
    },
    {
      "Command": "storeText",
      "Target": "xpath=//*[@id=\"ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl${i}_LabelConsidered\"]/a",
      "Value": "!csvline",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "${page}|${i}|${!csvline}",
      "Value": "blue",
      "Description": "Show scraped data in log"
    },
    {
      "Command": "csvSave",
      "Target": "result1.csv",
      "Value": "",
      "Description": "Save scraped data to CSV"
    },
    {
      "Command": "executeScript_Sandbox",
      "Target": "return Number (${i}) + 1;",
      "Value": "i",
      "Description": "Add 1 to i"
    },
    {
      "Command": "repeatIf",
      "Target": "${i} < 100",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "executeScript_Sandbox",
      "Target": "return Number (${page}) + 1;",
      "Value": "page",
      "Description": "Add 1 to page"
    },
    {
      "Command": "executeScript",
      "Target": "__doPostBack('ctl00$ctl00$PageBody$PageContent$DataPager1$ctl02$ctl00','')",
      "Value": "",
      "Description": "NEXT button is tricky. Use XClick or simply this JS code"
    },
    {
      "Command": "echo",
      "Target": "Starting page ${page}",
      "Value": "green",
      "Description": ""
    },
    {
      "Command": "pause",
      "Target": "10000",
      "Value": "",
      "Description": "Wait for new page to load"
    },
    {
      "Command": "store",
      "Target": "true",
      "Value": "!statusOK",
      "Description": "Reset !status OK"
    },
    {
      "Command": "storeText",
      "Target": "id=ctl00_ctl00_PageBody_PageContent_ListViewSearchResults_ctrl0_HyperLink1",
      "Value": "test",
      "Description": "TEST web scraping to check if a new page of data is loaded. If not, we are done!"
    },
    {
      "Command": "repeatIf",
      "Target": "${test}  != \"#LNF\"",
      "Value": "",
      "Description": "Make sure we are on a page with results. If no anchor is found, storeText returns #LNF"
    },
    {
      "Command": "echo",
      "Target": "All done!",
      "Value": "green",
      "Description": ""
    }
  ]
}
1 Like