"Memory is the diary that we all carry about with us."

— Oscar Wilde, The Importance of Being Earnest, 1895

The first time I built a RAG system that "remembered" something it hadn't been told, I went looking for the bug. There wasn't one. The system had retrieved a document chunk that had been re-embedded under a slightly different surrounding context after the user's earlier query, and the embedding-shift made the chunk match the new query in a way it wouldn't have matched the same query a week earlier. The retrieval was correct. The behavior felt like memory. It wasn't.

A vector database is a search engine. A search engine that returns the same answer for the same query forever, regardless of what happened between queries, is not a memory. Memory is what changes when something is recalled. Memory is what fades when nothing recalls it. Memory is what gets reshaped by the emotion of the moment it's accessed. RAG does none of those things, because it isn't designed to.

This post is about what's required to give an agent something closer to actual memory. The neuroscience is well-established. The implementation is harder than it looks. AgentOS ships nine mechanisms from published cognitive-science literature, each independently configurable, each grounded in a primary source we cite below. None of them are RAG. RAG is one of the retrievers cognitive memory composes with, not a replacement for it.

RAG retrieves documents. It does not remember.

A RAG system stores chunks in a vector database and fetches the most similar ones at query time. That is search, not memory. Human memory is fundamentally different: it decays, drifts emotionally, gets suppressed by competing memories, and consolidates during sleep. None of that happens in a vector DB.

AgentOS implements 9 cognitive mechanisms from published cognitive science research to give agents something closer to actual memory. The architecture follows the CoALA framework (Sumers et al., arXiv:2309.02427), which formalizes how language agents should partition memory into working / episodic / semantic / procedural stores with explicit decision modules.

The problem with RAG-as-memory

Consider a customer support agent that has been running for three months. With RAG:

Every interaction is stored with equal weight, regardless of importance
A complaint from day 1 has the same retrieval probability as one from yesterday
There is no concept of "forgetting" irrelevant details
The agent cannot distinguish between what it observed directly and what it inferred

With cognitive memory:

Unimportant interactions naturally fade (Ebbinghaus decay)
Critical moments are locked in permanently (flashbulb memory)
Retrieving one memory suppresses similar competing ones (retrieval-induced forgetting)
Each memory tracks its source type and confidence level (source monitoring)

Comparable architectures in the published record include MemGPT (Packer et al., 2023; now part of Letta), which models the LLM as a virtual operating system with paged memory; and Generative Agents (Park et al., 2023), which adds memory streams with importance scoring and reflection. AgentOS layers explicit cognitive mechanisms on top of those primitives.

The 9 mechanisms

1. Ebbinghaus decay (Ebbinghaus, 1885)

Memories decay over time following a forgetting curve:


strength(t) = (importance / 10) × e^(-t/stability)

Where t is age in days and stability starts at 3.0 days. Each retrieval increases stability through spaced repetition (Cepeda et al., 2006):


stability' = min(stability × 1.8^n, 180)

Frequently-accessed memories persist while unused ones naturally fade: the same mechanism that makes flashcards work. Modern replication of the original 1885 forgetting curve is in Murre & Dros (2015) PLOS ONE.

2. Reconsolidation (Nader, Schafe & LeDoux, 2000)

Every time a memory is retrieved, its emotional context drifts 5% toward the agent's current mood:


trace.valence += 0.05 × (currentMood.valence - trace.valence);
trace.arousal += 0.05 × (currentMood.arousal - trace.arousal);

A negative memory recalled during a positive mood gradually becomes less negative. This models the established finding that memory reconsolidation is influenced by the retrieval context: memories are not static records, they are reconstructed each time they are accessed.

Reference: Nader, K., Schafe, G. E., & LeDoux, J. E. (2000). Fear memories require protein synthesis in the amygdala for reconsolidation after retrieval. Nature, 406(6797), 722-726.

3. Retrieval-induced forgetting (Anderson, Bjork & Bjork, 1994)

Retrieving memory A actively suppresses competing memories B and C (those with high content overlap). Competitors receive a 0.12 stability penalty.

This sharpens recall by inhibiting similar alternatives. Without it, an agent with thousands of memories about "customer complaints" would retrieve a random mix. With RIF, the most-retrieved complaint memories dominate while others fade.

4. Involuntary recall (Berntsen, 2009)

On 8% of retrieval calls, a random old memory surfaces unprompted, weighted by emotional intensity. The memory must be older than 14 days and have strength above 0.15.

Unexpected connections that pure relevance-based retrieval cannot produce. An agent discussing project deadlines might spontaneously recall a similar situation from months ago.

5. Temporal gist extraction (Reyna & Brainerd, 1995)

Memories older than 60 days with low retrieval counts are compressed to their core assertions. The full content is replaced with a gist; the emotional context and entities are preserved, but verbatim detail is lost.

This models Fuzzy Trace Theory: verbatim traces decay faster than gist traces. An agent remembers "that was a frustrating conversation with Alex about the API" without retaining every word.

6. Source confidence decay (Johnson, Hashtroudi & Lindsay, 1993)

Each memory's stability is multiplied by a source-type factor during consolidation:

Source Type	Multiplier	Rationale
`user_statement`	1.0	Directly observed
`tool_result`	1.0	Externally verified
`observation`	0.95	Perceived but grounded
`agent_inference`	0.80	Self-generated
`reflection`	0.75	Meta-cognitive

After 10 consolidation cycles, an agent_inference retains ~10.7% of its original stability while a user_statement retains 100%.

7. Schema encoding (Bartlett, 1932; Ghosh & Gilboa, 2014)

New memories are compared against existing memory cluster centroids:

Schema-congruent (cosine > 0.75): encoded with 0.85x strength
Schema-violating (below threshold): encoded with 1.3x strength

Schema-congruent traces also get a 1.1x stability boost during consolidation, modeling the finding that schema-matching information integrates faster (Tse et al., 2007, Science).

8. Metacognitive FOK (Nelson & Narens, 1990)

When a query activates memories below the retrieval threshold but above noise level (partial activation zone), the system generates a "feeling of knowing" signal:


interface MetacognitiveSignal {
  type: 'tip_of_tongue' | 'low_confidence' | 'high_confidence';
  feelingOfKnowing: number;  // 0-1
  partialInfo?: string;
}

The agent says "I have a vague memory about this but cannot fully recall the details" rather than hallucinating or saying nothing.

9. Emotion regulation (Gross, 2002)

High-arousal memories get dampened during consolidation cycles via cognitive reappraisal. Traces with arousal above the suppression threshold (0.8) have their emotional intensity gradually reduced.

The "time heals all wounds" mechanism: traumatic memories gradually lose their raw emotional punch.

Observer → Reflector pipeline

Raw conversation does not enter memory directly. A three-stage pipeline decomposes exchanges into typed traces:

Observer. Buffers conversation tokens until a threshold (30K tokens), then extracts dense observation notes.
Compressor. Batches 50+ notes into compressed observations.
Reflector. Consolidates observations into typed long-term traces with personality-biased conflict resolution.

The pipeline produces all 5 memory types automatically:

Type	What it stores	Example
`episodic`	Autobiographical events	"Had a tense conversation about deadline changes"
`semantic`	Factual knowledge	"User is a TypeScript developer in Portland"
`procedural`	Skills and patterns	"User prefers concise answers with code examples"
`prospective`	Future intentions	"User needs to submit the report by Friday"
`relational`	Trust signals and bonds	"User shared vulnerability about work stress, trust-building moment"

The episodic / semantic / procedural / prospective partition matches the standard cognitive psychology taxonomy (Tulving, 1972 for episodic-vs-semantic). The CoALA paper formalizes this as the recommended decomposition for language agents.

HEXACO modulation

All 9 mechanisms are modulated by the agent's HEXACO personality traits:

Trait	Mechanism Effect
Emotionality	Controls reconsolidation drift rate
Conscientiousness	Influences retrieval-induced forgetting strength
Openness	Affects involuntary recall probability and novelty boost
Honesty-Humility	Modulates source confidence skepticism
Agreeableness	Shapes emotion regulation strategy
Extraversion	Influences FOK threshold

A high-conscientiousness agent has stronger RIF (sharper recall, more suppression). A high-openness agent has more involuntary recalls (more creative connections). These mappings are grounded in personality psychology research: Lee & Ashton's HEXACO model (2004 onward).

Using cognitive memory


1{
2  "memory": {
3    "enabled": true,
4    "cognitiveMechanisms": {
5      "reconsolidation": { "enabled": true, "driftRate": 0.05 },
6      "retrievalInducedForgetting": { "enabled": true },
7      "involuntaryRecall": { "enabled": true, "probability": 0.08 },
8      "metacognitiveFOK": { "enabled": true },
9      "temporalGist": { "enabled": true, "ageThresholdDays": 60 },
10      "schemaEncoding": { "enabled": true },
11      "sourceConfidenceDecay": { "enabled": true },
12      "emotionRegulation": { "enabled": true }
13    }
14  }
15}


1import { agent } from '@framers/agentos';
2
3const assistant = agent({
4  provider: 'anthropic',
5  memory: {
6    enabled: true,
7    decay: 'ebbinghaus',
8    cognitiveMechanisms: {},  // all mechanisms with defaults
9  },
10  personality: {
11    openness: 0.85,
12    conscientiousness: 0.9,
13  },
14});

Why this matters

RAG gives agents access to information. Cognitive memory gives them the ability to selectively remember, naturally forget, and honestly report when they are unsure. For agents that run for days, weeks, or months, the difference between "retrieves everything equally" and "remembers what matters, forgets what does not" determines whether the agent remains useful or drowns in noise.

The benchmark numbers backing this claim: 70.2% on LongMemEval-M (first open-source library above 65% on the 1.5M-token variant) and 85.6% on LongMemEval-S (Pareto-win over Mastra OM gpt-4o).

References

Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting. JEPLMC, 20(5), 1063-1087.
Bartlett, F. C. (1932). Remembering. Cambridge University Press.
Berntsen, D. (2009). Involuntary Autobiographical Memories. Cambridge University Press.
Cepeda, N. J., et al. (2006). Distributed practice in verbal recall tasks. Review of Educational Research, 76(3), 354-380.
Ebbinghaus, H. (1885). Über das Gedächtnis. Duncker & Humblot.
Ghosh, V. E., & Gilboa, A. (2014). What is a memory schema? Neuropsychologia, 53, 104-114.
Gross, J. J. (2002). Emotion regulation: Affective, cognitive, and social consequences. Psychophysiology, 39(3), 281-291.
Johnson, M. K., Hashtroudi, S., & Lindsay, D. S. (1993). Source monitoring. Psychological Bulletin, 114(1), 3-28.
Murre, J. M. J., & Dros, J. (2015). Replication and analysis of Ebbinghaus' forgetting curve. PLOS ONE 10(7).
Nader, K., Schafe, G. E., & LeDoux, J. E. (2000). Fear memories require protein synthesis. Nature, 406, 722-726.
Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework. Psychology of Learning and Motivation, 26, 125-173.
Packer, C., et al. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560.
Park, J. S., et al. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442.
Reyna, V. F., & Brainerd, C. J. (1995). Fuzzy-trace theory. Learning and Individual Differences, 7(1), 1-75.
Sumers, T. R., et al. (2023). Cognitive Architectures for Language Agents. arXiv:2309.02427.
Tse, D., et al. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82.

AgentOS cognitive memory is open-source (Apache 2.0). GitHub · Documentation · npm

Cognitive Memory for AI Agents: Beyond RAG