mneme · genesis Stryx Labs
01 05 — Benchmarks
//01Benchmarks · v0.4.0 Genesis

10 queries.
Real codebase.

A standing benchmark: 10 hand-curated queries on a real production codebase (~12k Rust + 4k TS + 2k Python files). Pass means the tool returns the exact symbol set the human author would expect. Bench harness ships with Mneme.

//02Canonical recall · head-to-head

From 2/10
to ~6/10.

Genesis closes the recall gap by shipping three real symbol resolvers (Rust, TypeScript, Python). Same queries, same corpus, dramatic improvement.

~6/10v0.4 canonical recall
<500ms · first paint, 50k nodes
<1s · hot rebuild on edit
0bytes outbound on default

Mneme v0.4.0

Genesis · with symbol resolvers
6 / 10

Mneme v0.3.2

Previous release
2 / 10

CRG

Single-file find-references
6 / 10

ripgrep

Text matching
2 / 10
//03The full 10 queries

All ten.
Side by side.

A representative sample of what AI coding agents actually ask in practice. Each row scored against the known-correct answer set.

#QueryMneme v0.4CRGgrep
01Where is WorkerPool::spawn called?✓ exact✓ exact~ text
02Find callers of re-exported spawn()✓ exact~ partial✗ miss
03What breaks if I rename processOrder?✓ exact✓ exact✗ miss
04List dependencies of health.rs✓ exact✓ exact✗ miss
05Python __init__ with super().__init__~ partial✓ exact~ text
06TS class extending BaseHandler✓ exact✓ exact~ text
07Find all impl Display for *✓ exact~ partial~ text
08Rust macro-expanded callers of tracing::info!✗ miss✗ miss✗ miss
09Python overloaded get across hierarchy~ partial✓ exact✗ miss
10Cross-language refs (TS → Rust FFI)✗ miss✗ miss✗ miss
//04Methodology

How we
measure.

Real production codebase. Hand-authored queries. Reproducible harness. No cherry-picking — every result published with its scoring rule.

Corpus

~12k Rust files, ~4k TypeScript, ~2k Python. ~14k canonical symbols, ~50k call-graph edges.

Queries

10 hand-authored queries representing what AI coding agents actually ask. Each has a known-correct answer set provided by the original authors.

Scoring

Pass = returns the exact correct symbol set. Partial = returns subset but missing some. Miss = wrong answer or empty.

Reproducibility

The bench harness ships with Mneme. Run mneme bench in any indexed project to score it locally.

//05Path to 10/10

To 10/10.
v0.5 targets.

Genesis closes 4 gaps from 2/10 → ~6/10. Four remain, each with a stated ship target. The roadmap is public; gaps are not hidden.

→ Macro-expanded Rust

Symbol resolution through tracing macros, derive impls, async-trait. Targeted for v0.5.

→ Python overload hierarchies

Better MRO resolution, ABCs, mixin chains. Targeted for v0.5.

→ Cross-language FFI

TS calling Rust over WASM boundaries, Python ctypes. Research target.

→ Dynamic dispatch

Trait objects, virtual methods. Hardest; involves runtime data. Long-term research.