How It Works

Components

VisionAgent

Python SDK that orchestrates tasks. Sends screenshots to LLM, receives actions, executes via Agent OS.

from askui import VisionAgent

with VisionAgent() as agent:
    agent.act("Log into the admin dashboard")

Agent OS

Device driver running locally. Provides screen capture, mouse/keyboard control, multi-display support.

LLM

Understands UI from screenshots, plans actions, executes step-by-step. Configurable—use AskUI’s default or bring your own.

Execution Flow

Screenshot → Agent OS captures screen
Understanding → LLM identifies UI elements
Planning → LLM determines next action
Execution → Agent OS performs click/type
Loop → Repeat until task complete

Why Task-Based?

# Traditional (breaks when UI changes)
driver.find_element(By.ID, "email-input").send_keys("test@example.com")

# AskUI (adapts to visual changes)
agent.act('Enter "test@example.com" in the email field')

AI understands intent, not implementation. UI changes don’t break your automation.

Get Started

Define Tasks

Extend Your Agent

Agent OS

Additional Resources

How It Works

Components

VisionAgent

Agent OS

LLM

Execution Flow

Why Task-Based?

Get Started

Define Tasks

Extend Your Agent

Agent OS

Additional Resources

​Components

​VisionAgent

​Agent OS

​LLM

​Execution Flow

​Why Task-Based?

Components

VisionAgent

Agent OS

LLM

Execution Flow

Why Task-Based?