Extract all data from a page need suggestion

newuserkantu · May 31, 2019, 11:41pm

Hi

From this site I want to save this columns

IP address
Port
Protocol
Country

What type of code I need to use ?

I need to save in csv all lines to have a csv with all infos saved.

Need a solution to detect every line to save in csv.

Thanks for suggestion

commensal · June 1, 2019, 10:36am

{
  "Name": "testcsv-1",
  "CreationDate": "2019-6-1",
  "Commands": [
{
  "Command": "open",
  "Target": "http://free-proxy.cz/en/proxylist/country/DE/all/ping/all",
  "Value": ""
},
{
  "Command": "store",
  "Target": "FAST",
  "Value": "!REPLAYSPEED"
},
{
  "Command": "store",
  "Target": "true",
  "Value": "!ErrorIgnore"
},
{
  "Command": "store",
  "Target": "1",
  "Value": "i"
},
{
  "Command": "label",
  "Target": "loopbegin",
  "Value": "<<<<<<<<<<<<<"
},
{
  "Command": "storeText",
  "Target": "xpath=//*[@id=\"proxy_list\"]/tbody/tr[${i}]/td[1]",
  "Value": "!csvLine"
},
{
  "Command": "storeText",
  "Target": "xpath=//*[@id=\"proxy_list\"]/tbody/tr[${i}]/td[2]",
  "Value": "!csvLine"
},
{
  "Command": "csvSave",
  "Target": "proxies",
  "Value": ""
},
{
  "Command": "storeEval",
  "Target": "${i}+1",
  "Value": "i"
},
{
  "Command": "gotoIf",
  "Target": "${i} < 10",
  "Value": "loopbegin"
}
  ]
}

commensal · June 1, 2019, 10:38am

It does get stuck for a timeout on empty rows (where I suppose ads were blocked by my adblocker) but after that continues without any issues.

commensal · June 1, 2019, 10:40am

vivaldi_E4e16KFwLi

newuserkantu · June 1, 2019, 1:55pm

Hello thank you, you are very good, I have not managed alone.

Can you recommend me a code to extract all the 4 columns I have indicated?

IP address
Port
Protocol
Country

Can you teach me how to find the right XPath for the extraction of data from Web pages?

How did you find xpatch more accurate?

Thanks for help me

TheWhippinpost · June 1, 2019, 5:38pm

Right-click on the data you want to extract, select “Inspect”, then right-click again in the window that opens and click: “Copy > Copy xpath”

commensal · June 1, 2019, 7:10pm

The method above works. But the easiest is to use “select” button in Kantu.
vivaldi_X1e6kYEYb1

Add two more commands after two storetext.
Look… tr[${i}] is a row runner in a loop. And td[#] is number of column.

{
  "Command": "storeText",
  "Target": "xpath=//*[@id=\"proxy_list\"]/tbody/tr[${i}]/td[3]",
  "Value": "!csvLine"
},
{
  "Command": "storeText",
  "Target": "xpath=//*[@id=\"proxy_list\"]/tbody/tr[${i}]/td[4]",
  "Value": "!csvLine"
},

commensal · June 1, 2019, 7:16pm

BTW
Unrelated
How are those proxies?

newuserkantu · June 1, 2019, 7:25pm

Hello

You have found a great XPath, I found that for the same element can be more different XPath but some are complex to understand and use especially for the loop.

I tried the code you posted and it works fine with Firefox not freezing and it’s excellent.

I use an extension of Firefox that I find the Xpatch (Firefox addon TruePath) but often do not understand them because they have so many elements.

Thanks

commensal · June 1, 2019, 7:29pm

My rule of thumb: first to use Kantu built in method (button). If I’m not satisfied use Chrome built in feature as TheWhippinpost described. If I’m also not satisfied I search to unique ids, classes and etc… to write my own prefered xpath.
But for this website Kantu worked great.

newuserkantu · June 2, 2019, 2:09pm

Thanks for Select button suggestion

It working like a charm

Kaisellll · December 8, 2021, 2:53am

I am getting issue, loop is not starting, can you please suggest me where I am making mistake.

Here is the code:
},
{
“Command”: “store”,
“Target”: “FAST”,
“Value”: “!REPLAYSPEED”,
“Description”: “”
},
{
“Command”: “store”,
“Target”: “true”,
“Value”: “!ErrorIgnore”,
“Description”: “”
},
{
“Command”: “store”,
“Target”: “1”,
“Value”: “i”,
“Description”: “”
},
{
“Command”: “label”,
“Target”: “loopbegin”,
“Value”: “<<<<<<<<<<<<<”,
“Description”: “”
},
{
“Command”: “storeText”,
“Target”: “xpath=//[@class=‘table table-bordered’]/tbody/tr[${i}]/td[1]",
“Value”: “!csvLine”,
“Description”: “”
},
{
“Command”: “storeText”,
“Target”: "xpath=//[@class=‘table table-bordered’]/tbody/tr[${i}]/td[2]”,
“Value”: “!csvLine”,
“Description”: “”
},
{
“Command”: “csvSave”,
“Target”: “test1”,
“Value”: “”,
“Description”: “”
},
{
“Command”: “executeScript”,
“Target”: “return ${i}+1”,
“Value”: “i”,
“Description”: “”
},
{
“Command”: “gotoLabel”,
“Target”: “${i} < 20”,
“Value”: “loopbegin”,
“Description”: “”
},
{