Visual approval detection
OpenAI Computer Use API provides visual button detection without depending on brittle pixel coordinates.
Case Study
A multi-desktop orchestration framework that keeps several AI coding agents productive at once by detecting approvals visually, generating prompts when work stalls, and continuously scoring output quality.
This project is a Python automation framework built to manage multiple AI coding agents running across different virtual desktops at the same time. Instead of relying on fragile coordinate-based automation, the orchestrator uses the OpenAI Computer Use API to visually detect approval actions such as Allow, Keep Edits, and Try Again, which makes the control loop more adaptable to UI changes.
The orchestration layer is designed around continuous utilization. If no approval event fires within an hour, the system assumes the current prompt stream has stalled. It then calls OpenAI to generate ten fresh prompts from the live project's documentation and distributes them round-robin across active agent panels.
That alone would only solve throughput. The more interesting layer is quality control. The framework includes a transcript-scoring pipeline that classifies output as COMPLETED, IN_PROGRESS, NEEDS_REVISION, or BLOCKED, and feeds failed prompts into a blacklist so the same weak instructions are not recycled back into the system.
Together with per-session API cost tracking, a master CLI launcher, and a test-backed Python package structure, the result is an agent orchestration system focused on sustained developer throughput rather than one-off desktop scripting.
Each VS Code Copilot agent is hosted on its own virtual desktop, which isolates workstreams while allowing the orchestrator to supervise them as a coordinated pool.
The system visually detects buttons like Allow, Keep Edits, and Try Again, replacing brittle pixel-matching logic with a more resilient control mechanism.
When the system sees prolonged inactivity, it generates ten new coding prompts grounded in the current project's documentation.
New work is balanced across active panels so no single agent becomes overloaded while others sit idle.
Every panel transcript is graded as COMPLETED, IN_PROGRESS, NEEDS_REVISION, or BLOCKED so the orchestrator can decide what to do next instead of assuming all activity is useful progress.
Failed prompts are blacklisted to prevent recurrence, gradually improving prompt quality and reducing wasted agent cycles over time.
OpenAI Computer Use API provides visual button detection without depending on brittle pixel coordinates.
Prompt distribution keeps simultaneous Copilot agent panels evenly utilized across the desktop estate.
When agents stall for one hour, the system derives ten fresh prompts from live project documentation and restarts throughput automatically.
A four-tier quality pipeline scores outcomes and gives the control layer an explicit signal for intervention or continuation.
Real-time API cost tracking makes it possible to monitor spend at the run and agent level instead of treating orchestration cost as opaque.
The framework ships as an installable Python package with editable dev installs, pytest coverage, and CI-oriented documentation.