Extract all data from a page need suggestion

Hi

From this site I want to save this columns

IP address
Port
Protocol
Country

http://free-proxy.cz/en/proxylist/country/DE/all/ping/all

What type of code I need to use ?

I need to save in csv all lines to have a csv with all infos saved.

Need a solution to detect every line to save in csv.

Thanks for suggestion

{
  "Name": "testcsv-1",
  "CreationDate": "2019-6-1",
  "Commands": [
{
  "Command": "open",
  "Target": "http://free-proxy.cz/en/proxylist/country/DE/all/ping/all",
  "Value": ""
},
{
  "Command": "store",
  "Target": "FAST",
  "Value": "!REPLAYSPEED"
},
{
  "Command": "store",
  "Target": "true",
  "Value": "!ErrorIgnore"
},
{
  "Command": "store",
  "Target": "1",
  "Value": "i"
},
{
  "Command": "label",
  "Target": "loopbegin",
  "Value": "<<<<<<<<<<<<<"
},
{
  "Command": "storeText",
  "Target": "xpath=//*[@id=\"proxy_list\"]/tbody/tr[${i}]/td[1]",
  "Value": "!csvLine"
},
{
  "Command": "storeText",
  "Target": "xpath=//*[@id=\"proxy_list\"]/tbody/tr[${i}]/td[2]",
  "Value": "!csvLine"
},
{
  "Command": "csvSave",
  "Target": "proxies",
  "Value": ""
},
{
  "Command": "storeEval",
  "Target": "${i}+1",
  "Value": "i"
},
{
  "Command": "gotoIf",
  "Target": "${i} < 10",
  "Value": "loopbegin"
}
  ]
}
1 Like

It does get stuck for a timeout on empty rows (where I suppose ads were blocked by my adblocker) but after that continues without any issues.

vivaldi_E4e16KFwLi

1 Like

Hello thank you, you are very good, I have not managed alone.

Can you recommend me a code to extract all the 4 columns I have indicated?

IP address
Port
Protocol
Country

Can you teach me how to find the right XPath for the extraction of data from Web pages?

How did you find xpatch more accurate?

Thanks for help me

Right-click on the data you want to extract, select “Inspect”, then right-click again in the window that opens and click: “Copy > Copy xpath”

1 Like

The method above works. But the easiest is to use “select” button in Kantu.
vivaldi_X1e6kYEYb1

Add two more commands after two storetext.
Look… tr[${i}] is a row runner in a loop. And td[#] is number of column.

{
  "Command": "storeText",
  "Target": "xpath=//*[@id=\"proxy_list\"]/tbody/tr[${i}]/td[3]",
  "Value": "!csvLine"
},
{
  "Command": "storeText",
  "Target": "xpath=//*[@id=\"proxy_list\"]/tbody/tr[${i}]/td[4]",
  "Value": "!csvLine"
},
1 Like

BTW
Unrelated
How are those proxies?

Hello

You have found a great XPath, I found that for the same element can be more different XPath but some are complex to understand and use especially for the loop.

I tried the code you posted and it works fine with Firefox not freezing and it’s excellent.

I use an extension of Firefox that I find the Xpatch (Firefox addon TruePath) but often do not understand them because they have so many elements.

Thanks

My rule of thumb: first to use Kantu built in method (button). If I’m not satisfied use Chrome built in feature as TheWhippinpost described. If I’m also not satisfied I search to unique ids, classes and etc… to write my own prefered xpath.
But for this website Kantu worked great.

2 Likes

Thanks for Select button suggestion

It working like a charm