First Ui.Vision AI version

This new version integrates the power of large language models (LLM). We start in version 9.3.6 with two AI commands: aiPrompt and aiScreenXY. Both use the Anthropic Claude API. Below you find four demo macros to get started with AI and prompting:

AI Demo Macros

Prompt_CompareImages

{
  "Name": "Prompt_CompareImages",
  "CreationDate": "2024-11-18",
  "Commands": [
    {
      "Command": "aiPrompt",
      "Target": "canvas_wyoming_dpi_96.png#canvas_wyoming_dpi_96.png#Are both images the same?\nAnswer only with true or false. Answer in lowercase only.",
      "Value": "result",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Test1: Are the images the same? ${result}",
      "Value": "green",
      "Description": ""
    },
    {
      "Command": "verify",
      "Target": "result",
      "Value": "true",
      "Description": "Should be false, as the images are NOT the same"
    },
    {
      "Command": "aiPrompt",
      "Target": "canvas_wyoming_dpi_96.png#canvas_wyoming_verify_dpi_96.png#\nAre both images the same? Answer only with true or false. NO OTHER TEXT.",
      "Value": "result",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Test2: Are the images the same? ${result}",
      "Value": "green",
      "Description": ""
    },
    {
      "Command": "verify",
      "Target": "result",
      "Value": "false",
      "Description": "Should be true, as both images are the same"
    }
  ]
}

Prompt_ParseHTML

{
  "Name": "Prompt_ParseHTML",
  "CreationDate": "2024-11-18",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://forum.ui.vision/",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "executeScript",
      "Target": "var str = document.body.innerHTML; // Get page source\n\n//Next: Clean up HTML source before further processing  \n\n//First remove scripts and style tags with their content\nstr = str.replace(/<script\\b[^<]*(?:(?!<\\/script>)<[^<]*)*<\\/script>/gi, '');\nstr = str.replace(/<style\\b[^<]*(?:(?!<\\/style>)<[^<]*)*<\\/style>/gi, '');\n   \n//Then remove all remaining tags but keep their content\nstr = str.replace(/<[^>]+>/g, '');\n   \n//Clean up whitespace\nstr = str.replace(/\\s+/g, ' ').trim();\n   \nreturn str;",
      "Value": "html",
      "Description": "Extract entire HTML code of website"
    },
    {
      "Command": "echo",
      "Target": "Entire HTML extracted (long): ${html}",
      "Value": "brown",
      "Description": ""
    },
    {
      "Command": "aiPrompt",
      "Target": "What are the titles of the first 5 forum posts? ${html}",
      "Value": "s",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Title=${s}",
      "Value": "green",
      "Description": ""
    }
  ]
}

ScreenXY_Browser

{
  "Name": "ScreenXY_Browser",
  "CreationDate": "2024-11-18",
  "Commands": [
    {
      "Command": "open",
      "Target": "https://forum.ui.vision/",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "XDesktopAutomation",
      "Target": "false",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "aiScreenXY",
      "Target": "Find the search icon (magnifying glass).",
      "Value": "s",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Adjusted X,Y coordinates: ${!ai1},${!ai2}, Original Result=${s}",
      "Value": "blue",
      "Description": ""
    },
    {
      "Command": "XClick",
      "Target": "${!ai1},${!ai2}",
      "Value": "",
      "Description": "Click search icon"
    },
    {
      "Command": "XType",
      "Target": "browser automation${KEY_ENTER}",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "aiScreenXY",
      "Target": "Find the first search result (blue text)",
      "Value": "s",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Adjusted X,Y coordinates: ${!ai1},${!ai2}, Original API result=${s}",
      "Value": "blue",
      "Description": ""
    },
    {
      "Command": "XClick",
      "Target": "${!ai1},${!ai2}",
      "Value": "",
      "Description": "Click first search result link"
    }
  ]
}

ScreenXY_Desktop


{
  "Name": "ScreenXY_Desktop",
  "CreationDate": "2024-11-18",
  "Commands": [
    {
      "Command": "XDesktopAutomation",
      "Target": "true",
      "Value": "",
      "Description": ""
    },
    {
      "Command": "aiScreenXY",
      "Target": "Look for the Ui.Vision IDE. In it, find the Clear button",
      "Value": "s",
      "Description": ""
    },
    {
      "Command": "echo",
      "Target": "Adjusted X,Y coordinates: ${!ai1},${!ai2}, Original Result=${s}",
      "Value": "blue",
      "Description": ""
    },
    {
      "Command": "XClick",
      "Target": "${!ai1},${!ai2}",
      "Value": "",
      "Description": "Click on Clear button"
    },
    {
      "Command": "echo",
      "Target": "Clear button pressed",
      "Value": "green",
      "Description": ""
    }
  ]
}
4 Likes

FYI the key is free, but for business use, the AI functionality is not:

Thanks for this hint. There used to be a free $5 API credit after signing up & verifying your phone number. But it seems this offer is gone. We will update our documentation accordingly.

As a cost benchmark, the Anthropic API cost of running the above four demo macros is 0.02 US$:

Anthropic API cost estime

Is the benefit worth the cost? This certainly strongly depends on your RPA use case. Our medium term goal is to use local LLMs like Llama or Mistral - just like we already include local and free computer vision and OCR options. But as of today, these local LLMs are not good enough yet for computer use automation.

As a rule of thumb, we recommend to use local automation whenever possible. It is faster and free. However, the new AI commands allow you to automate tasks that where not (easily) automatable before. For example, you can use them to overcome a certain tricky part of the automation (e. g. finding & clicking a confirmation dialog that changes position, text, shape and color), and then continue with the local automation.

1 Like

for comparison
2 images have to be screenshot of web page text/element
or it can be just 2 images?

aiPrompt can take any image or file that is stored in the Screenshots, CSV or Visual tabs as input.

aiprompt input sources

so it can be any images not connected to opened web page?
since I dont have credits, I can’t try

Yes, it can be any images. The aiPrompt command works the same as asking Claude directly, with up to two attachments (the images).