Overview

This project is a Python automation framework built to manage multiple AI coding agents running across different virtual desktops at the same time. Instead of relying on fragile coordinate-based automation, the orchestrator uses the OpenAI Computer Use API to visually detect approval actions such as Allow, Keep Edits, and Try Again, which makes the control loop more adaptable to UI changes.

The orchestration layer is designed around continuous utilization. If no approval event fires within an hour, the system assumes the current prompt stream has stalled. It then calls OpenAI to generate ten fresh prompts from the live project's documentation and distributes them round-robin across active agent panels.

That alone would only solve throughput. The more interesting layer is quality control. The framework includes a transcript-scoring pipeline that classifies output as COMPLETED, IN_PROGRESS, NEEDS_REVISION, or BLOCKED, and feeds failed prompts into a blacklist so the same weak instructions are not recycled back into the system.

Together with per-session API cost tracking, a master CLI launcher, and a test-backed Python package structure, the result is an agent orchestration system focused on sustained developer throughput rather than one-off desktop scripting.

How It Works

01 Agents run on virtual desktops

Each VS Code Copilot agent is hosted on its own virtual desktop, which isolates workstreams while allowing the orchestrator to supervise them as a coordinated pool.

02 Computer Use API watches for approvals

The system visually detects buttons like Allow, Keep Edits, and Try Again, replacing brittle pixel-matching logic with a more resilient control mechanism.

03 Idle for one hour triggers prompt generation

When the system sees prolonged inactivity, it generates ten new coding prompts grounded in the current project's documentation.

04 Prompts are distributed round-robin

New work is balanced across active panels so no single agent becomes overloaded while others sit idle.

05 Transcript scorer evaluates output quality

Every panel transcript is graded as COMPLETED, IN_PROGRESS, NEEDS_REVISION, or BLOCKED so the orchestrator can decide what to do next instead of assuming all activity is useful progress.

06 Blacklist feedback loop filters bad prompts

Failed prompts are blacklisted to prevent recurrence, gradually improving prompt quality and reducing wasted agent cycles over time.

Key Features

Visual approval detection

OpenAI Computer Use API provides visual button detection without depending on brittle pixel coordinates.

Round-robin orchestration

Prompt distribution keeps simultaneous Copilot agent panels evenly utilized across the desktop estate.

Idle-time prompt generation

When agents stall for one hour, the system derives ten fresh prompts from live project documentation and restarts throughput automatically.

Transcript quality scoring

A four-tier quality pipeline scores outcomes and gives the control layer an explicit signal for intervention or continuation.

Cost visibility

Real-time API cost tracking makes it possible to monitor spend at the run and agent level instead of treating orchestration cost as opaque.

Package-grade delivery

The framework ships as an installable Python package with editable dev installs, pytest coverage, and CI-oriented documentation.

AI Desktop Agent Orchestrator