Finding Needles in Haystacks with Fuzzy Matching
Fuzzy matching is a process for linking up names that are similar, but not quite the same. It has become an increasingly important part of data-led investigations as a way to identify connections between public figures, key people, and companies that are relevant to a story. This class will cover how fuzzy matching typically fits into the investigative process, with some story examples. We will show you how to run some of the different types of fuzzy match on some real datasets, including the pros and cons of each.
Technical Requirements
Own laptop required. Install Python 3. On Macs open the Terminal (inside Applications, then Utilities) and run: pip3 install csvmatch. On Windows, also install Cygwin, then open Cygwin and run: pip3 install csvmatch.
Max Harlow
Max Harlow works on the visual and data journalism team at the Financial Times, focusing on investigations. He also runs Journocoders, a group for journalists to develop technical skills for use in their reporting.