Agent Memory¶

Agent Memory is a powerful feature that allows agents to remember information across different sessions and users. Unlike conversation history (which is session-specific and typically ephemeral), Agent Memory extracts semantic, episodic, or procedural information and stores it in a persistent store for long-term retrieval.

Agent Memory vs. Conversation History¶

It is important to distinguish between these two:

Feature	Conversation History	Agent Memory
Scope	Current Session	Cross-session / Cross-user
Storage	Typically RAM or Session Store	Vector Database or File System
Retrieval	All messages sent to LLM	Semantic search based on relevance
Content	Raw messages	Extracted facts, procedures, and events

Agent Memory Extension¶

The AgentMemoryExtension enables memory capabilities. It handles memory retrieval (injecting relevant facts into the prompt) and memory extraction (learning from the current conversation).

Configuration Options¶

The AgentMemoryExtension can be configured using its builder:

Option	Type	Default	Description
`memoryStore`	`AgentMemoryStore`	Required	The storage implementation for memories.
`memoryExtractionMode`	`MemoryExtractionMode`	`INLINE`	How memories are extracted. See Extraction Modes.
`minRelevantReusabilityScore`	`int`	`0`	Minimum score (0-10) for a memory to be saved or retrieved. Helps filter "noise".
`objectMapper`	`ObjectMapper`	Default Mapper	Used for serializing memory units.

Example Setup¶

final var memoryExtension = AgentMemoryExtension.builder()
        .memoryStore(memoryStore)
        .memoryExtractionMode(MemoryExtractionMode.INLINE)
        .minRelevantReusabilityScore(7) // Only "highly reusable" memories
        .build();

final var agent = new MyAgent(AgentSetup.builder()
        .model(model)
        .extension(memoryExtension)
        .build());

Memory Tools¶

The AgentMemoryExtension provides a specialized tool for semantic memory retrieval.

Tool Name	Description	Parameters
`agent_memory_extension_find_memories`	Retrieves relevant memories from the persistent store based on a natural language query.	`query` (String)

When this extension is registered, the agent is instructed to use this tool to check for relevant background information before proceeding with complex tasks.

Memory Extraction Modes¶

The MemoryExtractionMode determines the lifecycle of memory creation:

Mode	Description
`INLINE`	Extraction happens during the primary model call using structured output. Nuance: This is the most efficient mode but is not supported in streaming mode. Sentinel AI will automatically force an out-of-band extraction if the agent is run in streaming mode.
`OUT_OF_BAND`	Extraction happens as a separate, asynchronous model call after the primary response is generated. This ensures extraction works even with streaming.
`DISABLED`	No new memories are extracted. Useful for read-only memory agents.

Storage Implementations¶

Sentinel AI requires an EmbeddingModel to generate vector representations of memories for semantic search.

File System Storage (`sentinel-ai-filesystem`)¶

Ideal for local development or small-scale applications. It stores memories as JSON files and vectors in a local directory.

final var storage = FileSystemAgentMemoryStorage.builder()
        .baseDir("./data/memory")
        .mapper(objectMapper)
        .embeddingModel(embeddingModel) // e.g., LocalEmbeddingModel or OpenAIEmbeddingModel
        .build();

Performance

FileSystemAgentMemoryStorage performs a linear scan and manual cosine similarity calculation for search. It is not intended for production use with thousands of memories.

Elasticsearch Storage (`sentinel-ai-storage-es`)¶

Recommended for production. Uses Elasticsearch's native KNN (k-nearest neighbors) search for efficient retrieval.

final var storage = ESAgentMemoryStorage.builder()
        .client(esClient)
        .embeddingModel(embeddingModel)
        .indexPrefix("prod") // Optional: prefixes the 'agent-memories' index
        .build();

Vector Dimensions

The Elasticsearch implementation automatically determines vector dimensions based on the provided EmbeddingModel during initial index creation. If you change your embedding model later, you may need to recreate the index to match the new dimension count.

Tips and Nuances¶

Reusability Scores: The LLM assigns a reusability score to each extracted memory. Use minRelevantReusabilityScore (e.g., 7) to prevent your store from being cluttered with session-specific trivia.
Memory Scopes:
- AGENT: Shared knowledge (e.g., "Field 'X' in the database refers to User Salary").
- ENTITY: User-specific (e.g., "User prefers dark mode").
Facts Injection: Memories are injected as Facts into the system prompt. This happens automatically based on the userId provided in AgentRequestMetadata.

Dangers and Risks¶

PII & Privacy: Since memories are stored persistently across sessions, be extremely careful about extracting Personal Identifiable Information (PII). You can use a AgentMessagesPreProcessor to mask data before it reaches the extraction task.
Hallucinations: The LLM might "remember" things that weren't explicitly stated or were misunderstood. Periodic auditing of the memory store is recommended.
Token Overhead: Retrieving too many memories (high count) can bloat your system prompt and increase costs/latency.
Embedding Costs: Every save and every search requires an embedding model call. If using remote models (like OpenAI), this adds to your per-request cost.