The Centre for Investigative Journalism
The Centre for Investigative Journalism
Menu

Data Wrangling with Pandas 2

Your data is squeaky clean and ready to go – time to dig deep and start hunting for those elusive leads. Pandas allows you to quickly and easily perform statistical analysis on your data helping you to mine for stories and look for outliers.

Participants can expect to learn programmatic methods to analyse large datasets and to visualise their results within Jupyter Notebook.

Why Python?

Python makes it easy to replicate your analysis at a later stage and reduces the threat of human error that many face in Excel. It’s also shareable within teams and allows you to document and explain your work within the notebook so you can come back to it later and easily pick up from where you left off.
There are no upper limits in terms of data size, you can use Python on a csv with 10 rows or a billion. You get to a point where the limitation is the speed of the RAM on your machine, at which point you need to switch to a server.

 

Technical Requirements

Participants should have previous experience of coding at a basic level or more.

Karrie Kehoe

Karrie Kehoe is a data journalist and researcher on the Data and Research team at the International Consortium of Investigative Journalists. Karrie has worked on award-winning global investigations like the FinCEN Files, Pandora Papers, Uber Files, Implant Files and more recently Deforestation Inc.

Max Harlow

Max Harlow works on the visual and data journalism team at the Financial Times, focusing on investigations. He also runs Journocoders, a group for journalists to develop technical skills for use in their reporting.
Class
Intermediate