May 7 / Aneta Klosek

Levenshtein, Jaro-Winkler, and the Art of Finding the Name You Almost Missed

Sanctions screening sounds straightforward until you try to do it with real data.

The theory is simple: compare a name against a list, flag what matches, review what's flagged. In practice, name data is messy in ways that make exact matching almost immediately useless. "Mohammed," "Muhammad," and "Mohamad" are not rare edge cases, they're everyday examples of how the same person's name can appear in a dozen different forms depending on transliteration standards, regional conventions, or whoever entered the data.

So organisations turn to fuzzy matching. And that's where things get interesting and quiet honestly, complicated.

Levenshtein distance counts the number of edits, meaning the insertions, deletions, substitutions needed to turn one string into another. It's intuitive and widely used. If two names differ by one or two characters, the edit distance is low and the system flags them as similar. The limitation is that it treats all edits equally, regardless of where they fall in the name or what they mean.

Jaro-Winkler similarity works differently. It gives extra weight to matching prefixes and handles transpositions more graciously. In practice, this means it tends to perform better on short strings like personal names, where the start of a name carries more identifying weight than the end.

Choosing an algorithm is only half the decision. The similarity threshold, the score above which a name is flagged, is just as consequential.

Set the threshold too high, and you generate false positives: your team spends time reviewing names that aren't genuine matches. That's a workload problem, and it's visible. People complain about it.

Set the threshold too low, and you get false negatives: real matches that never make it into a queue. That problem is invisible until it surfaces in an audit or a regulatory inquiry.

This asymmetry creates a structural bias in how systems get tuned. Operational pain is immediate and measurable. Detection gaps are theoretical, right up until they aren't. Organisations optimise for the pain they can see, which means they often underweight the risk they can't.

In most organisations, the matching algorithm and threshold weren't actively chosen. They were inherited.

A vendor selected a default approach. A system integrator implemented it during deployment. The logic was never revisited because it appeared to work. Over time, it became part of the operational fabric; accepted, unchallenged, and largely invisible to the people responsible for the output it produces.

This isn't negligence. It's the natural result of how enterprise software gets deployed and maintained. But it does mean that the decisions shaping your sanctions exposure were probably made by someone who was thinking about implementation speed, not your specific risk appetite.

Fuzzy matching isn't a technical detail that can safely be delegated to a vendor and forgotten. It's a compliance decision with real regulatory consequences. The algorithm you use, and the threshold you set, determine which names get escalated and which ones disappear, and in sanctions screening, what you miss matters more than what you catch.

The industry has put enormous effort into escalation workflows, regulatory reporting, and audit trails. The matching logic that feeds all of that deserves the same scrutiny.

If you don't know which algorithm your system is using, or why the threshold is set where it is, that's worth finding out.

Aithea GmbH | Seidlstr. 5 | 80335 München | Germany
Registered at District Court Munich HRB 302338
VAT ID DE454846466 | nanoacademy@ai-thea.com

Impressum

Terms of Use

Cookies

Data Privacy

Levenshtein, Jaro-Winkler, and the Art of Finding the Name You Almost Missed

The Fuzzy Matching Trade-Off: Why the Algorithm Behind Your Sanctions Screening Matters More Than You Think

Fuzzy Matching Isn't One Thing

Levenshtein Distance

Jaro-Winkler Similarity

The Threshold Problem

How Most Systems End Up the Way They Are

What a Better Approach Looks Like

The Bottom Line

Levenshtein, Jaro-Winkler, and the Art of Finding the Name You Almost Missed

The Fuzzy Matching Trade-Off: Why the Algorithm Behind Your Sanctions Screening Matters More Than You Think

Fuzzy Matching Isn't One Thing

Levenshtein Distance

Jaro-Winkler Similarity

The Threshold Problem

How Most Systems End Up the Way They Are

What a Better Approach Looks Like

The Bottom Line

Do not miss!

Be an early bird

Life time access for the first 50!