Back to Memory API

Cost Efficiency

Optimize storage and reduce costs

The Memory API is designed for cost efficiency at scale. Smart caching, automatic deduplication, and tiered storage help you optimize without sacrificing intelligence.

How Pricing Works

You're billed based on three factors: storage (GB/month), operations (reads/writes), and compute (embedding generation). Our tiered approach means you only pay premium rates for frequently accessed data.

Storage
$0.01-0.10/GB/mo
Operations
$0.001 per 1K ops
Embeddings
$0.0001 per 1K tokens

Storage Tiers

Memories automatically move between tiers based on access patterns. Recent and frequently accessed memories stay hot, while older data moves to cheaper storage.

TierLatencyCostBest For
Hot<10ms$0.10/GB/moActive conversations, recent context
Warm<50ms$0.03/GB/moUser preferences, historical data
Cold<200ms$0.01/GB/moArchives, compliance, rare access

Tier transitions happen automatically. You can also pin important memories to hot storage.

Built-in Optimizations

Smart Deduplication

Automatically detects and merges semantically similar memories, reducing storage by up to 40%.

Compression

Vector embeddings and metadata are compressed without losing quality, saving 30-50% on storage.

Automatic Tiering

Memories move between hot, warm, and cold storage based on access patterns. No manual management.

Edge Caching

Frequently accessed memories are cached at edge locations for sub-10ms retrieval globally.

Smart Deduplication

When storing memories, the API automatically checks for semantic duplicates. Similar memories are merged, keeping the most recent metadata while avoiding redundant storage.

// Enable deduplication (on by default)
await memory.store({
  content: "User prefers dark mode for all interfaces",
  entityId: "user_123",

  // Deduplication settings
  deduplicate: true,           // Enable dedup check
  dedupeThreshold: 0.92,       // Similarity threshold (0-1)
  dedupeStrategy: "merge"      // "merge", "skip", or "replace"
});

// If a similar memory exists:
// - merge: Combines metadata, keeps newer timestamp
// - skip: Ignores new memory, returns existing
// - replace: Overwrites existing with new

// Check dedup stats
const stats = await memory.stats();
console.log(`Dedup savings: ${stats.dedupSavingsPercent}%`);

Batch Operations

Batch operations reduce API calls and improve throughput. Use them for bulk imports, migrations, or high-volume workloads.

// Batch store (up to 100 memories per call)
const results = await memory.storeBatch([
  { content: "Memory 1", entityId: "user_123" },
  { content: "Memory 2", entityId: "user_123" },
  { content: "Memory 3", entityId: "user_456" }
], {
  deduplicate: true,
  parallel: true  // Process in parallel for speed
});

// Batch query (multiple queries in one call)
const batchResults = await memory.queryBatch([
  { text: "user preferences", entityId: "user_123" },
  { text: "recent purchases", entityId: "user_123" },
  { text: "support tickets", entityId: "user_123" }
]);

// Batch delete
await memory.deleteBatch({
  entityId: "user_123",
  olderThan: "90d"  // Delete memories older than 90 days
});

Query Caching

Frequently repeated queries are automatically cached. You can also control caching behavior for specific use cases.

// Automatic caching (default)
const results = await memory.query({
  text: "user preferences",
  entityId: "user_123"
  // Identical queries within 60s return cached results
});

// Force fresh results
const fresh = await memory.query({
  text: "user preferences",
  entityId: "user_123",
  cache: false  // Bypass cache
});

// Custom cache TTL
const cached = await memory.query({
  text: "static knowledge",
  cacheTtl: 3600  // Cache for 1 hour
});

// Preload cache for known queries
await memory.preloadCache([
  { text: "common query 1", entityId: "global" },
  { text: "common query 2", entityId: "global" }
]);

Cost Monitoring

Track your usage and costs in real-time through the API or dashboard.

// Get usage stats
const usage = await memory.usage({
  period: "current_month"  // or "last_7d", "last_30d"
});

console.log({
  storage: {
    hot: usage.storage.hot,      // GB in hot tier
    warm: usage.storage.warm,    // GB in warm tier
    cold: usage.storage.cold,    // GB in cold tier
    total: usage.storage.total
  },
  operations: {
    reads: usage.ops.reads,
    writes: usage.ops.writes,
    queries: usage.ops.queries
  },
  cost: {
    storage: usage.cost.storage,
    operations: usage.cost.operations,
    embeddings: usage.cost.embeddings,
    total: usage.cost.total
  }
});

// Set budget alerts
await memory.setBudgetAlert({
  threshold: 100,  // $100
  action: "notify"  // or "throttle", "pause"
});

Tips for Reducing Costs

  • 💡Enable deduplication — Reduces storage by 20-40% for most workloads
  • 💡Use batch operations — Fewer API calls means lower operation costs
  • 💡Set appropriate retention — Delete old memories you don't need
  • 💡Use caching wisely — Cache common queries, bypass for real-time needs
  • 💡Filter early — Use entityId and metadata filters to reduce result sets
  • 💡Monitor usage — Set budget alerts to avoid surprises

Estimate Your Costs

Use our pricing calculator to estimate monthly costs based on your expected usage.

View Pricing Calculator