RAG is obvious. Without it, you’re just talking to an ignorant AI with amnesia.

Automated RAG is also obvious — you shouldn’t have to say “hey, look at this” every prompt.

But doing robust automated RAG across a big system, in a cost-efficient manner (both time and tokens), is where it gets interesting. There’s a big value to just having a pile of useful files or tables that are gently referenced in an agent’s startup instructions. But for systems with many integrations that expect a wide variety of arbitrary prompts, a giant warehouse of data falls short.

This is the problem I set out to solve with what I’m calling the Subconscious Memory paradigm.

RAG is obvious cause otherwise you're just talking to an ignorant AI with amnesia

Automated RAG is also obvious, so you don't need to say "hey, look at this" every prompt

But doing robust auto RAG across a big system in a cost (time and tokens) efficient manner is interesting

— Colby (@ZerothAxiom) March 28, 2026

It sounds fancy, but it’s relatively simple.

The Problem: Agents Don’t Know What They Don’t Know

A task executor agent gets a request: “schedule a follow-up with the building manager.” The agent has no idea that there was a conversation about building maintenance two days ago across two different messaging threads. It doesn’t know that context exists. It doesn’t know to look. So it does the task cold — missing context that would have made its response significantly better.

This is the gap between active retrieval (agent searches for something specific) and passive awareness (agent knows what topics are in the air without being told to check). Traditional RAG handles the first case. Nothing handles the second.

The obvious fix — summarize recent activity and inject it into every session — breaks down fast:

Context bloat. A day’s worth of messaging summaries across multiple channels can easily hit 2,000+ tokens. Multiply by several agents running multiple sessions per day and you’re burning context window on information that’s irrelevant 90% of the time.

Stale narratives. Summaries are frozen interpretations. The summary says “there was a discussion about scheduling” — but the agent doing a calendar task needs to know who said what and when, not a second-hand account. The agent thinks it already has the context and doesn’t go look at the source.

Judgment baked in too early. When you summarize at write time, you’re deciding what matters before you know what task will need it. The overnight indexer has no idea that tomorrow’s agent will care about the throwaway comment about a plumbing issue but not the lengthy discussion about weekend plans.

The Design: Keywords + Source Pointers

The subconscious memory database stores no summaries, no narratives, no interpretations. It stores two things:

Keywords — short, normalized English words that describe active topics
Source pointers — references to where the raw data lives, with time windows

That’s it. The schema is three tables:

concepts
├── id (primary key)
└── expires (date — when to forget this)

keywords
├── id (primary key)
├── concept_id → concepts.id
└── word (lowercase, normalized)

sources
├── id (primary key)
├── concept_id → concepts.id
├── source_type (e.g., "messaging", "email", "calendar")
├── source_ref (channel identifier)
├── time_start (ISO timestamp)
└── time_end (ISO timestamp)

A single “concept” might look like this:

Table	Data
concept	id=1, expires=2026-04-03
keywords	building, maintenance, cleaning, entrance, trash
sources	messaging / group-chat-123 / 3:23 PM–1:31 AM
sources	messaging / dm-456 / 9:12 PM–9:15 PM

No summary. No “there was a discussion about building maintenance where residents complained about…” — just keywords and pointers.

How Agents Use It

Every agent has instructions in their startup file that tell them to carry all keywords from this database in passive context. Not the sources, not the full concepts — just the flat list of active keywords. A few dozen keywords costs under 100 tokens.

Subconscious awareness (recent activity):
  building, maintenance, cleaning, entrance, trash, ...

If your task touches any of these topics, query the subconscious
database for source pointers before proceeding.

Most sessions, this list is irrelevant and gets ignored. The agent carries it passively — like how you don’t actively think about every conversation you’ve had recently.

But when someone asks the agent to “message the building manager about the entrance situation,” the agent sees building and entrance in its subconscious keywords. It knows to do a lookup in the database, which points it to the original source data. It then reads that source data — the actual messages, from the actual conversations — before performing its work.

The key insight: judgment happens at query time, not at index time. The agent doing the task decides what’s relevant, not the overnight indexer that had no idea what task would come next.

The Indexer: Script Does Plumbing, LLM Does Judgment

Every N hours, an agent wakes up and reviews all data fresher than N hours. It creates concepts and keywords that map to those concepts. But it’s not a pure script — there’s a clean separation of responsibilities:

The script handles:

Pulling messages from the last 24 hours
Grouping them by conversation (filtering out noise: status broadcasts, empty messages, stickers)
Formatting conversations as readable text
Database operations: writes, deduplication, expiry cleanup
Error handling, logging, and daemon lifecycle

The LLM handles:

Reading the actual messages and understanding what’s being discussed
Identifying distinct topics (one conversation might contain three unrelated topics)
Recognizing when the same topic appears across multiple channels
Choosing keywords that are specific and meaningful (not “thing” and “stuff”)
Deciding how long each concept should live (a one-off incident vs. an ongoing situation)
Translating from other languages when needed

This split matters. The first version used a bag-of-words keyword extraction algorithm — pure script, no LLM. It produced keywords like: ugh, doing, anyway, ill, leave, door, think, anyone, anything, until, cleaning, man, comes, see, like. Useless. These are the most common words in casual conversation, not meaningful concepts.

The LLM version, looking at the same messages, produced: building, maintenance, cleaning, entrance, trash, homeless, garbage. It correctly identified a single coherent concept spanning two different conversations, picked specific nouns over filler words, and translated from Spanish where needed.

Decay: Subconscious Memories Are Temporary

The agent that creates subconscious memories also assigns a TTL for each keyword-concept entry, so subconscious memories decay. This is a judgment call — not a formula.

A one-off incident (“someone left the front door open”) gets a 3-7 day expiry. An upcoming appointment gets 14-30 days. An ongoing dispute gets 30-60 days. Only the LLM reading the actual content can make this call reasonably.

When a concept expires, it’s deleted — and all its keywords and source pointers cascade-delete with it. The nightly cleanup is a single query. No orphan rows, no garbage collection.

On consecutive indexer runs, if the same topic reappears, the system doesn’t create a duplicate. It checks for keyword overlap (40% threshold), and if a match is found, it extends the expiry, adds any new keywords, and widens the source time windows. Concepts that keep appearing in conversation naturally stay alive. Concepts that stop being discussed expire on schedule.

Source-Agnostic by Design

The schema uses generic source_type and source_ref fields rather than anything platform-specific. The current implementation indexes messaging, but the same schema handles email, calendar events, documents, or any other data source without modification:

source_type	source_ref	What it points to
messaging	group-chat-id	A group conversation
messaging	dm-id	A direct message thread
email	message-id	A specific email
calendar	event-id	A calendar event
document	doc-id	A shared document

Adding a new data source means writing a new ingestion function. The schema, the query interface, and the keyword loading mechanism don’t change.

Where This Fits

The standard AI agent memory stack:

System prompt — static instructions, always loaded
RAG — active retrieval when the agent needs specific information
Conversation history — the current session’s back-and-forth
Long-term memory — persistent facts about the user, preferences, prior decisions

Subconscious Memory sits in the gap between RAG and long-term memory. It’s not permanent (concepts expire). It’s not actively retrieved (agents don’t search for it). It’s not conversational (it persists across sessions). It’s a passive awareness layer — ambient context that costs almost nothing to carry and occasionally triggers a deeper lookup.

For multi-agent systems, this is particularly valuable. Different agents handle different domains — one manages email, another handles scheduling, another does research. Without passive awareness, each agent is an island. Agent A might process a message about a meeting reschedule, but Agent B (which handles the calendar) never learns about it unless explicitly told. Subconscious Memory gives every agent a shared peripheral vision.

What This Is Not

This is not a replacement for RAG. RAG handles the case where an agent knows it needs information and goes looking for it. Subconscious Memory handles the case where an agent doesn’t know it needs information — the unknown unknowns.

This is not a knowledge graph. There are no relationships between concepts, no ontology, no reasoning over the structure. It’s intentionally flat. Keywords and pointers. The LLM does all the reasoning at query time.

This is not a conversation log. The database stores no message content. The subconscious is an index, not an archive.

Implementation Notes

A few things learned during implementation:

Use LLM extraction, not NLP. Bag-of-words, TF-IDF, and basic NER all produce garbage keywords from casual conversation. Casual text is full of filler, slang, abbreviations, and code-switching between languages. An LLM handles all of this naturally. The cost of one LLM call per nightly indexing run is trivial compared to the quality difference.

Distinguish extraction failure from empty results. The indexer can legitimately find no concepts (a quiet day). It can also fail (LLM timeout, bad JSON, API error). These must be handled differently — an empty result is normal, a failure should be logged and retried.

Time-scope your source pointers. Early versions stored only the channel reference without timestamps. This meant agents had to scan entire conversation histories. Adding time_start and time_end to source pointers lets agents jump directly to the relevant slice.

Let the indexer decide TTL, not a formula. Fixed decay doesn’t work because topics have wildly different relevance windows. Only the LLM reading the actual content can make this call.

Upsert, don’t replace. When the same concept reappears on consecutive runs, widen the source time windows (take the earlier start and later end) rather than replacing them. This preserves the full temporal extent of a conversation spanning multiple days.

Try It

If you’re building AI agent systems — personal assistants, enterprise copilots, multi-agent workflows — consider whether your agents have a passive awareness gap:

Agents ask for context they should already have
Users have to repeat information that exists elsewhere in the system
Agents do tasks “cold” when relevant context from recent activity would improve their output
Multi-agent systems where agents in one domain are blind to activity in another

The implementation is lightweight: a SQLite database, a nightly indexer script, a single LLM call for extraction, and a few lines of code at agent initialization to load keywords. The total cost per day is one LLM inference call plus a handful of SQLite queries.

The subconscious pattern is the piece that makes AI agents feel like they’re paying attention rather than just responding to commands.

Applied Intelligence builds AI agent systems for businesses. If you’re working on agent architectures and want to talk through how passive awareness, memory systems, or multi-agent coordination could work for your use case, get in touch.

Ready to put this to work in your business?

Applied Intelligence helps San Diego and Southern California businesses automate workflows, reduce manual work, and grow without adding headcount. The first conversation is free and takes 20 minutes.

Book a Free Discovery Call →

Subconscious Memory: Giving AI Agents Passive Awareness