Machine Learning At Scale

Machine Learning At Scale

A 0.44 Recall Collapse That Looked Like 0.81 Global Success [Edition #10]

Discover how in-batch negatives created an anisotropic echo for 2 million documents and the monitoring blind spot you must avoid.

Ludovico Bessi's avatar
Ludovico Bessi
May 23, 2026
∙ Paid

LexiSearch is a Series A legal-tech and B2B SaaS search company that recently crossed the milestone of 50,000 enterprise seats. They have seen 300 percent year-over-year growth in document ingestion, primarily serving law firms and corporate compliance departments.

Their engineering team built a semantic retrieval engine that powers the primary search bar for internal document management systems. Here is their setup.

Architecture Overview

When a user enters a query, the system triggers a retrieval-augmented flow to find relevant clauses or documents.

Traffic patterns:

Total documents indexed: 25 million

Queries per second: 120 req/sec average

Peak: 350 req/sec during morning hours (EST)

The ML Pipeline:

The model is a dual-tower bi-encoder based on an MPNET backbone. It was trained using a standard contrastive loss with in-batch negatives. The training data consists of 1 million query-document pairs, including a mix of MS MARCO and 50,000 domain-specific legal pairs. The embedding dimension is 768. The vector store uses FAISS with an HNSW index (M=32, efConstruction=200) hosted on r6g.4xlarge instances.

Current performance:

P99 Latency: 185ms

Reliability: 99.95 percent uptime

Recall@10 (Global): 0.81

Costs:

Inference Nodes (G4dn.2xlarge): $6,500 per month

Vector Database (Memory-optimized nodes): $9,000 per month

Total: $15,500 per month

Recent incidents:

Recall collapse for Global Capital Partners: After onboarding 2 million financial filings for a new high-value client, the Recall@10 for that specific account dropped from 0.81 to 0.44. The system stayed up, but users reported that search was basically broken for their documents.

The Analysis

Now let me show you what is actually happening here.

Critical Issue #1: The Anisotropic Echo

I write about ML systems in production — the tradeoffs, the architecture decisions, the stuff that doesn’t make it into papers. If you want to go deeper, the paid tier covers the technical details I can’t fit in free posts.

User's avatar

Continue reading this post for free, courtesy of Ludovico Bessi.

Or purchase a paid subscription.
© 2026 Ludovico Bessi · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture