"So we and our elaborately evolving computers may meet each other halfway. Someday a human being, named perhaps Fred White, may shoot a robot named Pete Something-or-other, which has come out of a General Electric factory, and to his surprise see it weep and bleed. And the dying robot may shoot back and, to its surprise, see a wisp of gray smoke arise from the electric pump that it supposed was Mr. White's beating heart."

— Philip K. Dick, The Android and the Human, 1972

Most agent frameworks treat the LLM as a function call, hand the result back to your application, and let your application do everything that should outlive the call. Memory across sessions, personality consistency, tool registration, multi-agent coordination, retry on tool failure: all of it ends up in handlers you write yourself. After enough handlers, the application code is the agent and the framework is a thin shim over the model.

AgentOS puts those concerns inside the runtime. Persistent cognitive memory, optional HEXACO personality, runtime tool forging in a hardened sandbox, six multi-agent orchestration strategies, streaming guardrails, a voice pipeline, and one dispatch interface across the major LLM providers. This post is the announcement that the project is open source under Apache 2.0 and that the public benchmark numbers are real.

The short version: npm install @framers/agentos.

What AgentOS Is

AgentOS is a TypeScript runtime for building AI agents that adapt, remember, and collaborate. Every agent is a Generalized Mind Instance (GMI): a persistent cognitive core with personality traits, episodic memory, and autonomous decision-making.


npm install @framers/agentos


1import { agent } from '@framers/agentos';
2
3const bot = agent({
4  provider: 'anthropic',
5  instructions: 'You are a helpful assistant.',
6  personality: { openness: 0.9, conscientiousness: 0.85 },
7  memory: { enabled: true, cognitive: true },
8});
9
10const reply = await bot.session('demo').send('What is AgentOS?');
11console.log(reply.text);

What Makes It Different

Cognitive Memory

8 neuroscience-grounded memory mechanisms modulated by the agent's HEXACO personality:

Reconsolidation: memories rewrite each time they are recalled, incorporating new context. Based on Nader et al. (2000) on memory reconsolidation in fear conditioning.
Retrieval-induced forgetting: retrieving one memory suppresses related competing memories. Based on Anderson et al. (1994).
Involuntary recall: contextual cues trigger unexpected memory retrieval. Based on Berntsen (2010).
Ebbinghaus decay: exponential memory decay following the forgetting curve, replicated and validated by Murre & Dros (2015).
Feeling-of-knowing, temporal gist extraction, schema encoding, source confidence decay: each mechanism implemented in the CognitiveMechanismsEngine, with trait modulation controlled by HEXACO personality dimensions.

Memory follows a 4-tier hierarchy (working memory, episodic, semantic, observational) that consolidates upward automatically. This approach is grounded in the same ACT-R cognitive architecture principles used by recent systems like Memory Bear and CortexGraph.

Multi-Agent Orchestration

6 coordination strategies for teams of specialized agents:

Strategy	Description	Use Case
Sequential	Linear pipeline, each agent refines previous output	Editing chains, translation pipelines
Parallel	Fan-out to all agents simultaneously	Research, brainstorming, redundancy
Debate	Agents argue positions, synthesize consensus	Decision-making under uncertainty
Review loop	Author and reviewer iterate until quality threshold	Content QA, code review
Hierarchical	Manager delegates to specialized workers	Task decomposition
Graph (DAG)	Dependency-based execution with conditional branching	Complex multi-step workflows

Agents share memory through the AgentCommunicationBus and coordinate via the AgencyRegistry.

When strategy: 'hierarchical' is paired with emergent: { enabled: true }, the manager LLM gets a spawn_specialist tool alongside its delegate_to_<name> tools. Calling it forges a new sub-agent at runtime via EmergentAgentForge, gates it through EmergentAgentJudge for safety review, and adds the new specialist to the live roster, so delegate_to_<spawned-role> becomes available on the next turn. Bounds via planner.maxSpecialists, planner.maxJudgeCalls, and an optional HITL beforeEmergent gate. See Hierarchical + emergent agent spawning for the worked example.

The image above is captured from a real node examples/emergent-hierarchical-spawning.mjs run. The team starts with researcher + writer; the prompt requires a security audit; the manager calls spawn_specialist, EmergentAgentJudge approves the spec, and security_audit_specialist joins the live roster. The [FORGE] line is the moment that happens. Reproduce: node examples/emergent-hierarchical-spawning.mjs after npm install @framers/agentos and export OPENAI_API_KEY=....

Emergent Tool Forging

Agents create new tools at runtime when no existing tool fits the task:

Compose mode: chains existing tools into pipelines (safe by construction)
Sandbox mode: generates code in a memory-bounded, time-limited isolation environment

An EmergentJudge reviews safety and correctness before activation. Approved tools promote through 3 trust tiers: session, agent, shared. The EmergentToolRegistry tracks usage and confidence scores.

Production Infrastructure

21 LLM providers with automatic fallback chains: OpenAI, Anthropic, Gemini, Ollama, OpenRouter, Groq, Together, Fireworks, Perplexity, Mistral, Cohere, DeepSeek, xAI, Bedrock, Qwen, Moonshot, plus CLI bridges for Claude Code and Gemini CLI
37 channel adapters: Telegram, Discord, Slack, WhatsApp, email, webchat, Twitter/X, Instagram, Reddit, Bluesky, Mastodon, and 26 more
6 guardrail packs across 5 security tiers from permissive to paranoid: PII redaction (4-tier detection: regex + NLP + NER + LLM), prompt injection defense, grounding guards, code safety scanning, topicality enforcement, content policy
Multimodal RAG: 7 vector backends (SQLite to Qdrant), 4 retrieval strategies, GraphRAG with Louvain community detection (Blondel et al., 2008), 10 document loaders
Voice pipeline: 12 STT providers, 12 TTS providers, VAD, speaker diarization, telephony via Twilio, Telnyx, Plivo
Graph-based orchestration: workflow() for deterministic DAGs, AgentGraph for loops and custom control flow, mission() for planner-driven orchestration
OpenTelemetry observability: opt-in OTEL export, cost tracking, token usage, session audit logs

TypeScript Native

Full type safety with Zod-validated structured output. ESM-first architecture. The TypeDoc API reference documents every public class, interface, and function.

How AgentOS Compares

Capability	AgentOS	LangGraph	CrewAI	Mastra	VoltAgent
Language	TypeScript	Python + JS	Python	TypeScript	TypeScript
Cognitive memory	8 mechanisms + Ebbinghaus decay	Checkpoints	Short/long-term	Semantic	Conversation + RAG
Personality	HEXACO 6-factor	None	Role descriptions	None	None
Channel adapters	37	None	None	None	None
Voice pipeline	12 STT + 12 TTS	None	None	None	None
Guardrails	6 packs	Middleware	Basic	None	Module
Tool forging	Runtime creation	None	None	None	None

See AgentOS vs LangGraph vs CrewAI vs Mastra vs VoltAgent for the full comparison.

What we measured (and what we didn't)

AgentOS ships with agentos-bench, an Apache 2.0 memory benchmark suite. We publish bootstrap CIs at 10k resamples on every headline number and the per-cell run JSON for replication. The recent results:

LongMemEval-S: 85.6% [82.4%, 88.6%] at $0.0090 per correct, gpt-4o reader, 4-second avg latency. Beats Mastra OM gpt-4o (84.2% published) on accuracy at matched reader. Beats EmergenceMem Simple Fast (80.6% measured in our harness, their public reference repo ships with no LICENSE) by +5.0 points at 6.5x lower cost-per-correct. Statistically tied with EmergenceMem Internal's published 86.0%, but Emergence's number comes from closed-source SaaS at emergence.ai/web-automation-api, not a library you can install. AgentOS ships the full architecture under Apache-2.0.
LongMemEval-M (1.5M tokens, 500 sessions per haystack): 70.2% [66.0%, 74.0%] at $0.0078 per correct with reader-router top-K=5. Competitive with the strongest published M results in the original LongMemEval paper (Wu et al, ICLR 2025, Table 3). At matched reader-Top-5, +4.5 points above the paper's round-level configuration (65.7%) and 1.2 below the paper's session-level configuration (71.4%); 1.8 below the paper's overall best (72.0% at round-level Top-10).

We do not run benchmarks against vendors that don't ship complete standalone runnables. We do not claim X-times-cheaper unless reader model and config match between the two systems being compared. The entire methodology (judges, sample sizes, judge FPR probes, adversarial calibration) is documented in agentos-bench/docs.

This is what an honest benchmark looks like. If something on this list is wrong, file an issue against agentos-bench and we'll fix it or retract the claim.

Get Started

Documentation
GitHub
Discord
npm
How to Build a TypeScript AI Agent in 5 Minutes
Adaptive vs. Emergent Intelligence
Paracosm launch, the simulation product built on AgentOS

FAQ

Why TypeScript? Most AI infrastructure is Python. Most production application code is JavaScript or TypeScript. The runtime that lives inside an application should match the application's language. AgentOS does. The Python interop story is via the API server (REST/SSE) or via JSON Schema-generated types from the artifact schema.

Is AgentOS a LangGraph alternative? It's an alternative if your job is "build an AI agent with memory and personality and tools." It is not an alternative if your job is "compose Python research code into a workflow." Different jobs. We have a head-to-head comparison post with honest-cost-rule applied.

Does AgentOS lock me into a specific LLM? No. 21 provider adapters ship; you can add yours in ~50 lines if it's not in the list. The provider abstraction is decoupled from the agent abstraction.

What's the deal with HEXACO personality? It's optional. Pass personality and the runtime biases retrieval, decision routing, and tool selection by the trait vector. Don't pass it and the runtime acts personality-neutral. We don't make HEXACO the centerpiece because not every agent needs it; we just make it work cleanly when you want it.

Is the cognitive memory feature the same as RAG? No. RAG retrieves documents. Cognitive memory is a layered system covering short-term context, episodic memory (events with time and emotion encoding), semantic memory (facts and relationships), and a Ebbinghaus-style decay model for forgetting. RAG is one of the retrievers cognitive memory composes with. Read Cognitive Memory Beyond RAG for the deeper version.

Can AgentOS run agents that talk on the phone? Yes. The voice pipeline (12 STT providers, 12 TTS providers) plus telephony adapters in the channels system gets you a voice agent that runs as a phone call. We have a case study with Wilds.ai on the companion side of the same stack.

What about safety? Six guardrail packs ship: topicality, ML classifiers (toxicity / prompt-injection / NSFW), grounding (NLI), PII redaction, code safety (static analysis), and skills (curated SKILL.md execution). Each has its own README in the packages/agentos-ext-* tree.

Is this production-ready for X? Define X. We don't publish a generic "production-ready" label because the answer depends on what you're shipping. We do publish concrete benchmark numbers, concrete safety posture, and concrete provider compatibility. Read those, judge for yourself, file issues when we miss.

Apache 2.0 licensed. Built by Manic Agency / Frame.dev.

Announcing AgentOS: Open-Source TypeScript AI Agent Runtime