Cost Efficiency
Optimize storage and reduce costs
The Memory API is designed for cost efficiency at scale. Smart caching, automatic deduplication, and tiered storage help you optimize without sacrificing intelligence.
How Pricing Works
You're billed based on three factors: storage (GB/month), operations (reads/writes), and compute (embedding generation). Our tiered approach means you only pay premium rates for frequently accessed data.
Storage Tiers
Memories automatically move between tiers based on access patterns. Recent and frequently accessed memories stay hot, while older data moves to cheaper storage.
| Tier | Latency | Cost | Best For |
|---|---|---|---|
| Hot | <10ms | $0.10/GB/mo | Active conversations, recent context |
| Warm | <50ms | $0.03/GB/mo | User preferences, historical data |
| Cold | <200ms | $0.01/GB/mo | Archives, compliance, rare access |
Tier transitions happen automatically. You can also pin important memories to hot storage.
Built-in Optimizations
Smart Deduplication
Automatically detects and merges semantically similar memories, reducing storage by up to 40%.
Compression
Vector embeddings and metadata are compressed without losing quality, saving 30-50% on storage.
Automatic Tiering
Memories move between hot, warm, and cold storage based on access patterns. No manual management.
Edge Caching
Frequently accessed memories are cached at edge locations for sub-10ms retrieval globally.
Smart Deduplication
When storing memories, the API automatically checks for semantic duplicates. Similar memories are merged, keeping the most recent metadata while avoiding redundant storage.
// Enable deduplication (on by default)
await memory.store({
content: "User prefers dark mode for all interfaces",
entityId: "user_123",
// Deduplication settings
deduplicate: true, // Enable dedup check
dedupeThreshold: 0.92, // Similarity threshold (0-1)
dedupeStrategy: "merge" // "merge", "skip", or "replace"
});
// If a similar memory exists:
// - merge: Combines metadata, keeps newer timestamp
// - skip: Ignores new memory, returns existing
// - replace: Overwrites existing with new
// Check dedup stats
const stats = await memory.stats();
console.log(`Dedup savings: ${stats.dedupSavingsPercent}%`);Batch Operations
Batch operations reduce API calls and improve throughput. Use them for bulk imports, migrations, or high-volume workloads.
// Batch store (up to 100 memories per call)
const results = await memory.storeBatch([
{ content: "Memory 1", entityId: "user_123" },
{ content: "Memory 2", entityId: "user_123" },
{ content: "Memory 3", entityId: "user_456" }
], {
deduplicate: true,
parallel: true // Process in parallel for speed
});
// Batch query (multiple queries in one call)
const batchResults = await memory.queryBatch([
{ text: "user preferences", entityId: "user_123" },
{ text: "recent purchases", entityId: "user_123" },
{ text: "support tickets", entityId: "user_123" }
]);
// Batch delete
await memory.deleteBatch({
entityId: "user_123",
olderThan: "90d" // Delete memories older than 90 days
});Query Caching
Frequently repeated queries are automatically cached. You can also control caching behavior for specific use cases.
// Automatic caching (default)
const results = await memory.query({
text: "user preferences",
entityId: "user_123"
// Identical queries within 60s return cached results
});
// Force fresh results
const fresh = await memory.query({
text: "user preferences",
entityId: "user_123",
cache: false // Bypass cache
});
// Custom cache TTL
const cached = await memory.query({
text: "static knowledge",
cacheTtl: 3600 // Cache for 1 hour
});
// Preload cache for known queries
await memory.preloadCache([
{ text: "common query 1", entityId: "global" },
{ text: "common query 2", entityId: "global" }
]);Cost Monitoring
Track your usage and costs in real-time through the API or dashboard.
// Get usage stats
const usage = await memory.usage({
period: "current_month" // or "last_7d", "last_30d"
});
console.log({
storage: {
hot: usage.storage.hot, // GB in hot tier
warm: usage.storage.warm, // GB in warm tier
cold: usage.storage.cold, // GB in cold tier
total: usage.storage.total
},
operations: {
reads: usage.ops.reads,
writes: usage.ops.writes,
queries: usage.ops.queries
},
cost: {
storage: usage.cost.storage,
operations: usage.cost.operations,
embeddings: usage.cost.embeddings,
total: usage.cost.total
}
});
// Set budget alerts
await memory.setBudgetAlert({
threshold: 100, // $100
action: "notify" // or "throttle", "pause"
});Tips for Reducing Costs
- 💡Enable deduplication — Reduces storage by 20-40% for most workloads
- 💡Use batch operations — Fewer API calls means lower operation costs
- 💡Set appropriate retention — Delete old memories you don't need
- 💡Use caching wisely — Cache common queries, bypass for real-time needs
- 💡Filter early — Use entityId and metadata filters to reduce result sets
- 💡Monitor usage — Set budget alerts to avoid surprises
Estimate Your Costs
Use our pricing calculator to estimate monthly costs based on your expected usage.
View Pricing Calculator