Table Parsing: How to extract data from a certain row that matches a keyword

Table parsing (web scraping) is a common task for web automation, and often you don’t need to extract the entire table. Instead, you may need to search for a value (or click a link) in a specific column, such as column 1, that is located next to a certain keyword or text in column 3 of the same row.

In other words, the value in column 3 is used to identify the correct row, and from this row, you need to extract the data from column 1.

For example, let’s assume the task is to find a specific keyword in the “System” column and then click on the download link located to the left of it.

image

In Ui.Vision you can use conditional statements directly in the macro, which makes this web scraping task rather straightforward:

  1. Loop over each row in the table and extract the value/text from the keyword column using the storeText command. The locator is usually an XPath with numbers inside, which can be obtained by recording a click on a table cell. To make the XPath dynamic, replace the number in the XPath that indicates the row with a variable. For example, change

xpath=//tr[5]/td[3] (recorded by click on table cell)

to

xpath=//tr[${row}]/td[3]

  1. Next, check the extracted value with an If/then or Do…RepeatIf statement. If it matches the keyword(s), use the found row index (stored in the ${row} variable) in a click or storeText command for the first column.

The video and macro code below explain this process in more detail.

{
  "Name": "Extract X-th line of table",
  "CreationDate": "2023-5-10",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://www.7-zip.org/download.html",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "store",
      "Target": "1",
      "Value": "row",
      "Description": ""
    },
    {
      "Command": "do",
      "Target": "",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "executeScript_Sandbox",
      "Target": "return parseInt(${row})+1;",
      "Value": "row",
      "Description": "Add 1 to ROW variable | First row in table as HTML index of 2, that is why we start with 2"
    },
    {
      "Command": "storeText",
      "Target": "xpath=//tr[${row}]/td[3]",
      "Value": "text",
      "Description": "Extract text of 3rd column"
    },
    {
      "Command": "echo",
      "Target": "${row}th row text=${text}",
      "Value": "blue",
      "Description": ""
    },
    {
      "Command": "repeatIf",
      "Target": "${text} != \"64-bit Linux arm64\"",
      "Value": "",
      "Description": "Check if we found the right row"
    },
    {
      "Command": "echo",
      "Target": "Ok, match at ${row}th row, text=\"${text}\"",
      "Value": "green",
      "Description": ""
    },
    {
      "Command": "click",
      "Target": "xpath=//tr[${row}]/td/a",
      "Value": "",
      "Description": "Now click the download link in the same row!!!"
    }
  ]
}

Related posts about table parsing:

1 Like