May 7 / Aneta Klosek

Levenshtein, Jaro-Winkler, and the Art of Finding the Name You Almost Missed

The Fuzzy Matching Trade-Off: Why the Algorithm Behind Your Sanctions Screening Matters More Than You Think

Sanctions screening sounds straightforward until you try to do it with real data.

The theory is simple: compare a name against a list, flag what matches, review what's flagged. In practice, name data is messy in ways that make exact matching almost immediately useless. "Mohammed," "Muhammad," and "Mohamad" are not rare edge cases, they're everyday examples of how the same person's name can appear in a dozen different forms depending on transliteration standards, regional conventions, or whoever entered the data.

So organisations turn to fuzzy matching. And that's where things get interesting and quiet honestly, complicated.

Fuzzy Matching Isn't One Thing

When people say a system uses "fuzzy matching," they're often describing a black box. But fuzzy matching is a category, not a solution. The algorithm underneath makes a significant difference in what your system actually catches.

Two algorithms come up repeatedly in sanctions screening:

Levenshtein Distance 

Levenshtein distance counts the number of edits, meaning the insertions, deletions, substitutions needed to turn one string into another. It's intuitive and widely used. If two names differ by one or two characters, the edit distance is low and the system flags them as similar. The limitation is that it treats all edits equally, regardless of where they fall in the name or what they mean.

Jaro-Winkler Similarity

Jaro-Winkler similarity works differently. It gives extra weight to matching prefixes and handles transpositions more graciously. In practice, this means it tends to perform better on short strings like personal names, where the start of a name carries more identifying weight than the end.
To make this concrete: "Hussain" versus "Husasin" involves a transposition. Levenshtein sees two edits; Jaro-Winkler is more forgiving because the structure is clearly similar. "Mohamed Ali" versus "Mohamed Aly" — Jaro-Winkler rewards the matching prefix more heavily. These aren't hypothetical differences. They shape which names your system surfaces and which ones it quietly lets through.

The Threshold Problem

Choosing an algorithm is only half the decision. The similarity threshold, the score above which a name is flagged, is just as consequential.

Set the threshold too high, and you generate false positives: your team spends time reviewing names that aren't genuine matches. That's a workload problem, and it's visible. People complain about it.

Set the threshold too low, and you get false negatives: real matches that never make it into a queue. That problem is invisible until it surfaces in an audit or a regulatory inquiry.

This asymmetry creates a structural bias in how systems get tuned. Operational pain is immediate and measurable. Detection gaps are theoretical, right up until they aren't. Organisations optimise for the pain they can see, which means they often underweight the risk they can't.

How Most Systems End Up the Way They Are

In most organisations, the matching algorithm and threshold weren't actively chosen. They were inherited.

A vendor selected a default approach. A system integrator implemented it during deployment. The logic was never revisited because it appeared to work. Over time, it became part of the operational fabric; accepted, unchallenged, and largely invisible to the people responsible for the output it produces.

This isn't negligence. It's the natural result of how enterprise software gets deployed and maintained. But it does mean that the decisions shaping your sanctions exposure were probably made by someone who was thinking about implementation speed, not your specific risk appetite.

What a Better Approach Looks Like

The goal isn't to find the single "best" algorithm and apply it universally. It's to build a matching strategy that reflects how your data actually behaves and what risks matter most to your organisation.

In practice, that usually means:
  • Combining approaches rather than relying on a single algorithm. Different methods surface different things, and layering them reduces the chance of systematic blind spots.
  • Calibrating thresholds against real data, not vendor defaults. What counts as "similar enough" should be informed by examples from your own screening environment — including known true positives you can test against.
  • Segmenting by context. Names from different regions, languages, or scripts behave differently. A single threshold applied globally will be miscalibrated for at least part of your data.
  • Reviewing performance over time. Risk profiles change, data quality changes, and regulatory expectations evolve. A configuration that was reasonable two years ago may not be today.
Most importantly: someone in your organisation needs to own this. Not just the system, but the logic behind it. Understanding why your screening produces the results it does, and being able to explain that to an auditor or regulator, is increasingly part of what good compliance governance looks like.

The Bottom Line

Fuzzy matching isn't a technical detail that can safely be delegated to a vendor and forgotten. It's a compliance decision with real regulatory consequences. The algorithm you use, and the threshold you set, determine which names get escalated and which ones disappear, and in sanctions screening, what you miss matters more than what you catch.

The industry has put enormous effort into escalation workflows, regulatory reporting, and audit trails. The matching logic that feeds all of that deserves the same scrutiny.

If you don't know which algorithm your system is using, or why the threshold is set where it is, that's worth finding out.
Created with