The Centre for Investigative Journalism
The Centre for Investigative Journalism

Finding Needles in Haystacks with Fuzzy Matching

Fuzzy matching is a process for linking up names that are similar, but not quite the same. It has become an increasingly important part of data-led investigations as a way to identify connections between public figures, key people, and companies that are relevant to a story. This class will cover how fuzzy matching typically fits into the investigative process, with some story examples. We will show you how to run some of the different types of fuzzy match on some real datasets, including the pros and cons of each.

Technical Requirements

Own laptop required. Install Python 3. On Macs open the Terminal (inside Applications, then Utilities) and run: pip3 install csvmatch. On Windows, also install Cygwin, then open Cygwin and run: pip3 install csvmatch.

Max Harlow

Max Harlow works on the visual and data journalism team at the Financial Times, focusing on investigations. He also runs Journocoders, a group for journalists to develop technical skills for use in their reporting.