Thoughts of LLM and UI VISION Integration Test

I’ve spent the last couple of days testing the integration between UI Vision and LLM. Here are my observations and thoughts:

  1. AiComputerUse consumes a significant number of tokens. Integrating with open-source LLMs, such as Meta or Qwen, locally might be a more efficient approach.
  2. I’m unsure how to leverage AiComputerUse to manage internal enterprise applications, like internal forms and CRM systems. However, if ComputerUse can automate tasks based on prompts, it could reduce RPA development work and focus on data and prompt preparation. This could be the future of RPA development.
  3. Existing prompted instructions, such as AIScreenXY, help handle dynamic web responses and minimize change requests due to business design changes. I’m excited to explore integrating these features with popular LLMs like OpenAI, Azure OpenAI, Gemini, and DeepSeek. It’s thrilling to test LLMs with RPA and manage mouse movements, clicks, and content filling.

Please continue the excellent work and provide more integration options with different LLMs.

Thanks!

1 Like

Thanks for testing our LLM integration. Some thoughts:

  1. AiComputerUse uses the Claude Computer Use interface. To my knowledge, Meta, Qwen, DeepSeek, Mistral etc have no comparable feature yet. By contrast, AIScreenXY and especially aiPrompt could be connected to our other LLM providers. We will do this when there is enough demand for it and/or as part of our free tech support for Enterprise Customers.

  2. AiComputerUse is very powerful. Its main drawback for RPA automation at the moment is (1) reliability/repeatabilty (it is in Beta, and sometimes the LLM does not do what you expect it it to) and cost (as you said).

  3. By contrast AIScreenXY and AIPrompt are very reliable and do not cost much compute. From our perspective, these two commands are ready for use in production.