Coding Agents/ForgeCode

ForgeCode

by TailCallHQ

cliactivefreemium

A terminal-based AI pair programmer with a three-agent architecture — implement, research, and plan — supporting 300+ models and offering interactive TUI, one-shot CLI, and shell plugin modes.

ForgeCode is an open-source terminal AI pair programmer from Tailcall, Inc. (GitHub organization: TailCallHQ) that differentiates itself through a structured three-agent architecture: an implement agent for writing and editing code, a research agent for gathering context, and a plan agent for breaking down complex tasks before execution begins. Built in Rust and released under the Apache 2.0 license, it supports over 300 models across hosted providers and local runtimes, and ships three distinct interaction modes to fit different developer workflows. The project reached 7,400 GitHub stars and claims the top position on the TermBench 2.0 agentic coding benchmark with 81.8% accuracy.

Background

ForgeCode was created by Tushar Mathur, Founder and CEO of Tailcall, Inc. Mathur previously led engineering at Dream11, one of India's largest fantasy sports platforms, where he built and operated a GraphQL infrastructure that scaled from a few thousand requests per second in 2016 to half a billion requests per minute at peak by 2023, growing the engineering team from around ten engineers to over a thousand. After leaving Dream11, he founded Tailcall to commercialize the distributed systems and API-orchestration knowledge he had developed at scale.

Tailcall initially focused on a high-performance GraphQL proxy for cloud-native environments, and the company was selected for Surge, the Sequoia Capital India accelerator program (now Peak XV Partners), receiving backing from Surge and Tenacity Ventures. The GraphQL proxy remains active under the tailcall.run domain, but the company's primary product line shifted to AI-assisted developer tooling as large language models matured enough to drive practical terminal workflows.

ForgeCode's first public release — v0.1.0 — appeared on January 30, 2025. The repository was initially hosted under the antinomyhq GitHub organization before migrating to the current tailcallhq organization. The project is built almost entirely in Rust (94.4% of the codebase), a deliberate choice that enables the low-latency, concurrent execution the three-agent model requires. By July 2025, ForgeCode had graduated from early access with a 17-fold surge in signups and a 10-fold spike in usage, and launched paid tiers. By June 2026, the project had accumulated 2,797 commits across 362+ releases, reflecting a high-cadence development model.

Key capabilities

Three-agent architecture

ForgeCode separates work into three named sub-agents rather than relying on a single monolithic loop. The Forge agent (the default) is implementation-focused and has permission to modify files. The Sage agent (invoked with :ask) is read-only and used for research and architecture analysis — it can answer questions about the codebase without the risk of touching files. The Muse agent (invoked with :plan) generates structured plans and writes them to a plans/ directory in the repository, giving the developer a reviewable artifact before any code changes begin.

This separation is a deliberate token-efficiency decision. Tasks that only require reading code or answering a question do not consume the higher token budget of the full implement loop. Tasks that require a plan first get the plan reviewed and approved before the implement agent proceeds, reducing wasted inference on work that turns out to need re-scoping.

ForgeCode Services runtime layer

The performance gains documented in the forgecode.dev benchmark blog series came from a runtime engineering layer called ForgeCode Services, not from switching models. The layer includes semantic entry-point discovery (so the agent orients on a large codebase quickly rather than naively traversing directories), a tool-call correction layer (which repairs malformed LLM tool calls in-flight rather than failing the turn), task decomposition enforcement (the todo_write tool is mandatory for multi-step tasks, enforcing explicit planning before execution), and a tiered reasoning budget policy (high reasoning effort for the first messages in a session when understanding is needed, lower effort for execution turns when the plan is already set). These interventions moved the baseline TermBench 2.0 pass rate from approximately 25% to 78.4%, and subsequent schema-level fixes — reordering JSON schema fields, flattening nested objects, adding explicit truncation markers in partial file reads, and enforcing a mandatory verification pass before task completion — brought the score to 81.8% with both GPT-5.4 and Opus 4.6.

ZSH shell plugin and interaction modes

ForgeCode ships three interaction modes. The interactive TUI provides a full-screen terminal UI for exploratory, multi-turn coding sessions with persistent context. The one-shot CLI mode (forge -p "prompt") accepts a single prompt and exits cleanly, making ForgeCode composable with shell scripts, CI pipelines, and editor integrations. The ZSH plugin integrates ForgeCode directly into the existing shell session using a : prefix — developers type :commit to get an AI-generated commit message, :suggest <description> to translate a natural-language description into a shell command, :conversation to browse saved sessions, or simply : <prompt> to send an ad-hoc instruction, all without switching to a separate TUI or terminal pane.

The ZSH plugin preserves existing shell configuration: it is compatible with Oh My Zsh, existing aliases, and other plugins, since it attaches to the : prefix rather than overriding existing commands.

Model Context Protocol and extensibility

ForgeCode supports the Model Context Protocol (MCP) for connecting the agent to external tools and data sources via .mcp.json configuration files. Beyond MCP, developers can define custom skills as YAML/Markdown bundles stored in .forge/skills/ and custom agents as configuration in .forge/agents/. Skill precedence follows project-local > global > built-in, meaning teams can establish project-specific behaviors without modifying the tool globally. The configuration file AGENTS.md at the repository root provides persistent instructions that are injected into every session, giving teams a place to document coding standards that the agent enforces automatically.

ForgeCode also exposes a workspace server for semantic codebase indexing. Running :sync builds a vector index of the repository; subsequent searches use semantic similarity rather than text matching, which improves the agent's ability to locate the right files on large codebases without explicit file enumeration.

Broad provider and model coverage

ForgeCode maintains provider adapters for OpenAI (including o3-mini-high), Anthropic, Google Vertex AI, OpenRouter, Groq, Cerebras, xAI/Grok, DeepSeek, Amazon Bedrock (via access gateway), and any OpenAI-compatible endpoint. This gives the tool one of the widest provider surfaces in the agentic CLI category. Developers can assign different models to different agents within a single session — for example, routing the planning agent through a reasoning-heavy model while using a faster, cheaper model for implementation turns. Local Ollama support means ForgeCode is viable in air-gapped environments and cost-sensitive personal workflows.

Autonomy level

ForgeCode operates at autonomy level 3. The Muse agent surfaces a proposed task plan for user review before the Forge agent begins writing files, keeping humans in the loop at the highest-leverage decision point. Once approved, the Forge agent executes the plan with file edits and terminal commands, pausing for confirmation on destructive actions. Full autonomous execution without review is not a supported mode, reflecting a design philosophy that values correctness and token efficiency over maximum automation.

Strengths

  • Three-agent split reduces token waste by separating planning, research, and implementation into distinct, purpose-bounded agents
  • ZSH plugin enables inline terminal invocation without leaving the current shell context or disrupting existing configuration
  • Over 300 model endpoints covered, including local Ollama for air-gapped or cost-sensitive environments
  • Apache 2.0 license with no commercial restrictions
  • Three interaction modes (TUI, one-shot CLI, ZSH) cover exploratory sessions, scripted pipelines, and inline shell workflows
  • ForgeCode Services runtime layer delivers measurable benchmark gains independent of model selection
  • Active release cadence with 362+ releases since January 2025
  • Built in Rust for low-latency concurrent execution

Limitations

  • Smaller community than established tools such as Aider or Claude Code; fewer third-party integrations and community examples
  • TermBench 2.0 is a benchmark that ForgeCode's own team optimized against specifically; independent third-party verification of real-world coding task completion is limited
  • A critical bug (issue #2961 on the public GitHub tracker) was filed in 2026 regarding potential answer leakage to the CLI benchmark environment, which raises questions about the validity of some benchmark run
  • Freemium model limits free-tier usage to approximately 10-50 requests per day; heavy users will encounter the Pro ($20/month) or Max ($100/month) tiers
  • Three-agent architecture adds latency on simple single-file tasks where a single-pass agent would be faster
  • Documentation for advanced configuration (custom providers, enterprise Bedrock routing, skills authoring) is thinner than that of more mature CLI tools