BM25 demystified: Why Google starts with math, not AI
Search engines don’t start with AI. They start with math.
While SEOs obsess over semantic search and user intent, Google’s first step in ranking your content is way more basic: exact keyword matching.
If your content doesn’t hit the right keywords, you’re invisible—no matter how good your context or storytelling is. That’s where BM25 comes in. And if you don’t know how it works, you’re already behind.
Let’s break down what BM25 is, how it works, and why it still matters in a world dominated by large language models (LLMs) like BERT and GPT.
What is BM25? Breaking down the core formula
BM25 stands for “Best Matching 25”—but don’t get hung up on the name. What matters is that it’s one of the most widely used algorithms in search engines to calculate keyword relevance.
Here’s what BM25 does:
- It scores documents based on how well they match a query by analyzing how often the keywords appear in the document.
- It adjusts for document length to prevent longer documents from dominating the rankings.
- It gives more weight to rare terms (think niche keywords) and less weight to common ones like “the” or “and.”
The formula behind BM25
The actual BM25 formula includes two important parameters:
- k1 controls how much weight term frequency gets before it hits a saturation point.
- b adjusts for document length to ensure longer documents don’t have an unfair advantage.
Translation: BM25 rewards documents that contain the exact keywords a user searches for—but caps the reward for repetitive terms and penalizes overly long documents.
Sparse vs. dense vectors: the foundation of retrieval models
Search engines rely on two types of vector representations to retrieve and rank content:
- Sparse vectors are traditional keyword-based representations. Documents are stored as bags of words, where each word gets its own dimension in the vector space. Sparse vectors handle exact keyword matches best and are great for structured queries.
- Dense vectors are semantic representations created by LLMs like BERT. Instead of focusing on exact keywords, they map concepts and meanings into a more compressed vector space. Dense vectors are best for semantic matching and perform better at ambiguous or conversational queries.
BM25 operates entirely in the sparse vector space. This means it relies on exact keyword matches and treats documents as bags of words—a model where word order, grammar, and context are ignored.
Dense vectors, on the other hand, are typically generated by neural networks, which capture semantic relationships between words.
Bag-of-words retrieval: why exact matches still matter
In bag-of-words (BoW) retrieval, a document is treated as a collection of individual words, without considering the order or meaning of those words.
Think of it like a shopping list. If your list has “apples, bananas, oranges,” it doesn’t matter if you find oranges first—you just need to check off the items on the list.
BM25 works similarly. It looks for exact keyword matches in documents and ignores everything else.
But why does this matter when Google is pushing semantic search? Because exact matches still drive precision—especially for structured queries where the user expects a specific result.
When bag-of-words works best
- Product searches (e.g., “RTX 4090 specs”)
- Legal or technical documents (e.g., “GDPR compliance guide”)
- Fact-based queries (e.g., “Tesla Q3 revenue”)
Without exact keyword matches, search engines can miss critical documents. And that’s where BM25 comes in.
The role of BM25 in search engines: the first-stage ranker
Modern search engines don’t run BERT or GPT across their entire index—it would be too slow and expensive.
Instead, they use a two-stage ranking process.
First-stage retrieval, often using BM25 or a similar model, retrieves a shortlist of documents based on exact keyword matches.
Second-stage re-ranking, using LLMs like BERT, then re-ranks those documents based on context, nuance, and user intent.
Think of BM25 as the bouncer at a nightclub. It decides who gets inside based on a strict list of criteria (keywords). The VIP host (BERT) then decides who gets the best table based on who fits the vibe.
BM25 in LLMs: bridging the gap between keywords and semantics
While LLMs are great at understanding context and meaning, they often struggle with precise keyword matching.
That’s why modern retrieval pipelines combine BM25 and LLMs to balance precision and semantics.
Here’s how it works.
The first stage of retrieval involves BM25, which retrieves documents with exact keyword matches.
The second stage involves LLMs like BERT, which re-rank the documents by understanding context and intent.
For example, for a query like “best ergonomic chair for back pain,” BM25 retrieves documents with exact matches for “ergonomic,” “chair,” and “back pain.” BERT then re-ranks those documents by understanding user intent.
Hybrid retrieval systems, which combine sparse and dense retrieval, are becoming increasingly popular in modern search engines.
What BM25 gets wrong: the limitations search marketers must know
Despite its efficiency and precision, BM25 has limitations:
First, it can over-penalize long documents. BM25 adjusts for document length but can over-correct, causing longer content to rank lower.
Second, it relies heavily on exact keyword matching. BM25 works in the sparse vector space and doesn’t recognize synonyms or related terms.
Third, BM25 has limited semantic understanding. It doesn’t infer context or relationships between terms, which can limit its effectiveness for more nuanced queries.
When to use BM25: practical advice for search marketers
BM25 works best for queries that require precision and exact matches, like product searches, technical queries, and legal documents.
But for ambiguous or conversational queries, you’ll need to pair BM25 with semantic models.
For structured queries where users expect exact answers—like product specs or legal documents—BM25 is your best bet. But for ambiguous or conversational queries, semantic models like BERT are more effective.
Search marketers should focus on a combination of keyword optimization for BM25-like systems and high-quality, contextually rich content for LLMs.
Final takeaway
BM25 is still the first filter. If your content doesn’t pass through BM25’s gate, it won’t even be seen by BERT or GPT.
The winning formula for modern search optimization?
- Exact-match keywords for BM25
- Contextual content for LLMs
Bottom line: if you’re not optimizing for both BM25 and LLMs, you’re missing the mark.
References:
- Knowledge of knowledge: exploring known-unknowns uncertainty with large language models
- Why vector search is not enough and we need BM25
- Improvements to BM25 and language models explained
- Injecting the BM25 score as text improves BERT-based re-rankers
- 3 vector-based methods for similarity search (TF-IDF, BM25, SBERT)
- BM25 : the most important text metric in data science
- When documents are very long, BM25 fails!
- SPLADE: the first search model to beat BM25
- A machine learning approach for improved BM25 retrieval