First Ui.Vision AI version

admin · November 18, 2024, 11:36am

This new version integrates the power of large language models (LLM). We start in version 9.3.6 with two AI commands: aiPrompt and aiScreenXY. Both use the Anthropic Claude API. Below you find four demo macros to get started with AI and prompting:

AI Demo Macros

Prompt_CompareImages

{
  "Name": "Prompt_CompareImages",
  "CreationDate": "2024-11-18",
  "Commands": [
    {
      "Command": "aiPrompt",
      "Target": "canvas_wyoming_dpi_96.png#canvas_wyoming_dpi_96.png#Are both images the same?\nAnswer only with true or false. Answer in lowercase only.",
      "Value": "result",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Test1: Are the images the same? ${result}",
      "Value": "green",
      "Description": ""
    },
    {
      "Command": "verify",
      "Target": "result",
      "Value": "true",
      "Description": "Should be false, as the images are NOT the same"
    },
    {
      "Command": "aiPrompt",
      "Target": "canvas_wyoming_dpi_96.png#canvas_wyoming_verify_dpi_96.png#\nAre both images the same? Answer only with true or false. NO OTHER TEXT.",
      "Value": "result",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Test2: Are the images the same? ${result}",
      "Value": "green",
      "Description": ""
    },
    {
      "Command": "verify",
      "Target": "result",
      "Value": "false",
      "Description": "Should be true, as both images are the same"
    }
  ]
}

Prompt_ParseHTML

{
  "Name": "Prompt_ParseHTML",
  "CreationDate": "2024-11-18",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://forum.ui.vision/",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "executeScript",
      "Target": "var str = document.body.innerHTML; // Get page source\n\n//Next: Clean up HTML source before further processing  \n\n//First remove scripts and style tags with their content\nstr = str.replace(/<script\\b[^<]*(?:(?!<\\/script>)<[^<]*)*<\\/script>/gi, '');\nstr = str.replace(/<style\\b[^<]*(?:(?!<\\/style>)<[^<]*)*<\\/style>/gi, '');\n   \n//Then remove all remaining tags but keep their content\nstr = str.replace(/<[^>]+>/g, '');\n   \n//Clean up whitespace\nstr = str.replace(/\\s+/g, ' ').trim();\n   \nreturn str;",
      "Value": "html",
      "Description": "Extract entire HTML code of website"
    },
    {
      "Command": "echo",
      "Target": "Entire HTML extracted (long): ${html}",
      "Value": "brown",
      "Description": ""
    },
    {
      "Command": "aiPrompt",
      "Target": "What are the titles of the first 5 forum posts? ${html}",
      "Value": "s",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Title=${s}",
      "Value": "green",
      "Description": ""
    }
  ]
}

ScreenXY_Browser

{
  "Name": "ScreenXY_Browser",
  "CreationDate": "2024-11-18",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://forum.ui.vision/",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "XDesktopAutomation",
      "Target": "false",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "aiScreenXY",
      "Target": "Find the search icon (magnifying glass).",
      "Value": "s",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Adjusted X,Y coordinates: ${!ai1},${!ai2}, Original Result=${s}",
      "Value": "blue",
      "Description": ""
    },
    {
      "Command": "XClick",
      "Target": "${!ai1},${!ai2}",
      "Value": "",
      "Description": "Click search icon"
    },
    {
      "Command": "XType",
      "Target": "browser automation${KEY_ENTER}",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "aiScreenXY",
      "Target": "Find the first search result (blue text)",
      "Value": "s",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Adjusted X,Y coordinates: ${!ai1},${!ai2}, Original API result=${s}",
      "Value": "blue",
      "Description": ""
    },
    {
      "Command": "XClick",
      "Target": "${!ai1},${!ai2}",
      "Value": "",
      "Description": "Click first search result link"
    }
  ]
}

ScreenXY_Desktop


{
  "Name": "ScreenXY_Desktop",
  "CreationDate": "2024-11-18",
  "Commands": [
    {
      "Command": "XDesktopAutomation",
      "Target": "true",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "aiScreenXY",
      "Target": "Look for the Ui.Vision IDE. In it, find the Clear button",
      "Value": "s",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Adjusted X,Y coordinates: ${!ai1},${!ai2}, Original Result=${s}",
      "Value": "blue",
      "Description": ""
    },
    {
      "Command": "XClick",
      "Target": "${!ai1},${!ai2}",
      "Value": "",
      "Description": "Click on Clear button"
    },
    {
      "Command": "echo",
      "Target": "Clear button pressed",
      "Value": "green",
      "Description": ""
    }
  ]
}

Mark.Elder · November 19, 2024, 12:59am

FYI the key is free, but for business use, the AI functionality is not:

admin · November 19, 2024, 9:44am

Thanks for this hint. There used to be a free $5 API credit after signing up & verifying your phone number. But it seems this offer is gone. We will update our documentation accordingly.

As a cost benchmark, the Anthropic API cost of running the above four demo macros is 0.02 US$:

Anthropic API cost estime

Is the benefit worth the cost? This certainly strongly depends on your RPA use case. Our medium term goal is to use local LLMs like Llama or Mistral - just like we already include local and free computer vision and OCR options. But as of today, these local LLMs are not good enough yet for computer use automation.

As a rule of thumb, we recommend to use local automation whenever possible. It is faster and free. However, the new AI commands allow you to automate tasks that where not (easily) automatable before. For example, you can use them to overcome a certain tricky part of the automation (e. g. finding & clicking a confirmation dialog that changes position, text, shape and color), and then continue with the local automation.

uiuser · November 22, 2024, 11:09pm

for comparison
2 images have to be screenshot of web page text/element
or it can be just 2 images?

admin · November 27, 2024, 5:25pm

aiPrompt can take any image or file that is stored in the Screenshots, CSV or Visual tabs as input.

aiprompt input sources

uiuser · November 29, 2024, 1:27am

so it can be any images not connected to opened web page?
since I dont have credits, I can’t try

ulrich · November 29, 2024, 9:49am

Yes, it can be any images. The aiPrompt command works the same as asking Claude directly, with up to two attachments (the images).

autratec · March 8, 2025, 5:09am

Hi,

Thank you for your efforts in connecting ui.vision with LLM. Please continue the great work!

Regarding the recently released AI feature, is it ready to integrate with other LLMs, such as Meta, Qwen, or Deepseek, through a locally deployed API service?

admin · March 8, 2025, 8:54am

Hi @autratec - we are open to suggestions from our users. What kind of use case do you have in mind?

autratec · March 8, 2025, 12:42pm

Here are some thoughts on integrating RPA with LLM:

Package RPA scripts as an API service, enabling LLM to invoke them through function calls.
Establish a channel between LLM and Ui Vision, allowing actions to be created and executed in real-time.

To extend API integration, I propose connecting with local LLM API services, such as Qwen 32b. This would eliminate concerns about token usage.

In the long run, my vision is to have RPA and LLM collaborate to complete business transactions in real-time, creating and executing tasks dynamically rather than relying on predefined workflows.

autratec · March 8, 2025, 12:53pm

I hope we will move from predefined low code RPA,like uipath, power automate to llm/chatbot driven agent mode. We train ai how to use internal enterprise application, and they base our instructions/prompt to execute them.