Dynamic Table Scraping

Hello everyone,
I am trying to get data from a table that is dynamic and refreshes every five seconds. I used the xpath to get the text and store in a csv but by that I am getting only partial records and not full records. I want the whole data to be extracted at once. I have attached the screenshot of the table and my code. I would really appreciate any feedbacks and suggestions.

{
“Command”: “select”,
“Target”: “id=dashboardUserViewList”,
“Value”: “label=Online Agents (Nick1)”,
“Targets”: [
“id=dashboardUserViewList”,
“xpath=//[@id=“dashboardUserViewList”]",
“xpath=//select[@id=‘dashboardUserViewList’]”,
“xpath=//select”,
“css=#dashboardUserViewList”
]
},
{
“Command”: “click”,
“Target”: “id=dashboardUserViewList”,
“Value”: “”,
“Targets”: [
“id=dashboardUserViewList”,
"xpath=//
[@id=“dashboardUserViewList”]”,
“xpath=//select[@id=‘dashboardUserViewList’]”,
“xpath=//select”,
“css=#dashboardUserViewList”
]
},
{
“Command”: “pause”,
“Target”: “5000”,
“Value”: “”
},
{
“Command”: “comment”,
“Target”: “store // !replayspeed”,
“Value”: “SLOW”
},
{
“Command”: “store”,
“Target”: “{!runtime}", "Value": "start" }, { "Command": "pause", "Target": "3000", "Value": "start" }, { "Command": "storeText", "Target": "/html/body/div[2]/form/div[5]/div[2]/div/div[5]/div", "Value": "data" }, { "Command": "comment", "Target": "pause // 3000", "Value": "" }, { "Command": "store", "Target": "{!runtime}”,
“Value”: “end”
},
{
“Command”: “store”,
“Target”: “{end} - {start}”,
“Value”: “elapsed”
},
{
“Command”: “echo”,
“Target”: “{elapsed}", "Value": "" }, { "Command": "store", "Target": "!replayspeed", "Value": "FAST" }, { "Command": "store", "Target": "{data}”,
“Value”: “!csvLine”
},
{
“Command”: “csvSave”,
“Target”: “data.csv”,
“Value”: “”
},
{
“Command”: “echo”,
“Target”: “${data}”,
“Value”: “”
},

If refresh every 5 seconds there is not a solution, every macro fails in this case.

Can not find a stable solution.

Try a solution to block the auto refresh (example disabling javascript) otherwise you will seen errors and problems only

@newuserkantu. Thank you for your suggestion. Do you think there is some short code, I am making this up but like document.disable(“JS”), that I can use javascript in using executeScript to disable and enable javascript before and after pulling the data from table?

I think this is an hard work because the refresh of page is the very difficult and usually can freeze or create more bugs in ui vision like “Error #101: Kantu is not connected to a browser tab”.

Try but in my opinion you need more and more times and sometimes macro give you error caused from auto refresh.

You can try to save a scrennshot of page and after use OCR services

Surely with self refresh of the page can create various errors so it is not an easy job to do and you risk having errors continuously and never a macro that does its job well.

You have to try in so many ways and take a long time to find the solution that will have the least mistakes.

One idea can be to save the html page to your computer and then have the macro work on the saved page to extract the data.

Sometimes I have worked on html pages saved locally in my computer and ui vision works perfectly this way you wouldn’t have the problem of self refresh of the online page.

You still have to think about the best solution because if you extract the data incorrectly you will not be able to use it and you risk having everything wrong.

1 Like

@userkantu Thank you so much for your reponse. I did try saving the page. I am trying to pull data from a real time dynamic dashboard/table that refreshes every 5 seconds. So, after I saved the page and opened it there was not data visibile, like it was just a empty table with no rows.

I do know any working solution, sorry

1 Like

@userkantu No problem. Thank you so much.

1 Like

@userkantu I have a quick question about using OCR to solve this problem. I would really appreciate your suggestion on how to use OCR with scrolling/page down. Right now, when I take the OCR of the table by id of the table, I get the rows that are visible on the page only. Reading further elements would require scrolling until there are no records. And when I press the scroll button/page down, I am going down by just one row at once. So, I would have to take screenshot each time I do scroll down/page down and handle duplicate records. I was wondering if you could suggest a better way of doing this in ui vision. I am thinking of using loops until the end of the list but am not sure of what condition would go into the loop. Thank you for your time and help.

This is a hard work i suggest you 2 solution to try (try if work, i don’t know the page and i can not try)

  1. Install in browser Noscript this block all script and try if your page is visible and there is not auto refresh, after you can work in page to store value

  2. Try to save page and work in offline page (try if work and if is completed).

You can take screenshot with ui vision after you must upload it in OCR space to have a text format.

In my opinion this is an hard work, require more times to use to find best way to automate.

@userkantu Thank you so much for your suggestion. Look forward to trying your suggestions.

@userkantu. Thanks a lot for your suggestions. The no script browser option didn’t work since the no script activation would disable the updating of real time dashboard and wouldn’t be visible. Thanks again for your time and suggestion.

@newuserkantu I am sorry I am going back to previous comments but just wanted to clarify some things. You said ‘every macro fails in this case.’ Is it for the same reason that there is javascript calls happening too much because of the real time refresh which breaks the stability of macro. Do you think using selenium web driver with requests might be a better approach?

I think this is an hard case to automate, need to try every way to find best solution.

have you tried to save the page in locally ?

Ui vision can work with locally page saved in hard disk.

Alternative solution is take screenshot and use ocr services

Every solution is hard and require more and more times,.

If the work to be done does not have a great profit or does not allow you to earn money you do not want to spend much time for this automation, you risk having to study for a long time, maybe not even worth it.

@userkantu. Thanks again for your kind response. i did try saving the page locally but the real time dashboard was not visible and there were no data. Thanks again for your help.

Hi, just to clarify: Do you want to scrape this page just once, or really scrape it every 5 seconds (= 12 times a minute, 720 times in one hour)?

Hey @ulrich. Sorry about the late response. I want to scrape it ideally continuously but at least once every 5 minutes.

@Jivan_Kharel I’m not sure if this would work, but here are my thoughts. I would split this operation into two phases: 1.) create a macro that saves the page HTML every 5 seconds (and in the file name use onDownload and execute script sandbox to save the file name with a time stamp) - hopefully ui.vision can save some html within 5 seconds. Run this macro continually every 5 seconds or whichever interval you want. then 2.) simultaneously have a second macro running that will move these saved files to some archive directory and then scrape the needed data from them. This could take longer than 5 seconds to process but it’s ok because the data is now static and it can take however long it takes. But if you have macro 1 pumping out new files to scrape every 5 seconds and macro 2 takes longer than 5 seconds, then you’re eventually going to develop a backlog. So figure out a way to run n instances of macro 2 - enough that they can collectively process at least at the rate of speed macro 1 is outputting new files. I don’t know how you’d allocate the files to process among different instances of macro 2 - develop some kind of script that will allocate files on a round-robin basis to the different macro 2 programs you’re running. Or maybe set it up like this: have one instance macro 1 just saving the data to a file. Then have one instance of macro 2 just moving the data to the appropriate folder (or folders for different instances of macro 3) using a script in powershell, python, et. al. Then have a macro 3 just do the processing

I don’t know how you’d go about programming all of this but these are my thoughts about how I would try to do it conceptually. I don’t even know if it’s feasible but that’s where I’d start

Read here user try to save page and do not show values

@userkantu. Thanks again for your kind response. i did try saving the page locally but the real time dashboard was not visible and there were no data. Thanks again for your help.