The Centre for Investigative Journalism
The Centre for Investigative Journalism
Menu

Data Investigations with Python

Harness the power of Python and the powerful data analysis library Pandas to supercharge your journalism and work with large data sources in an efficient and reproducible way. This course will be hands-on using Google Colab to write, run and manage the code you build as part of the course.

Participants will be introduced to the many uses of Python for investigative data-driven research, with a range of opportunities to put the techniques into practice using examples and exercises from real-life data and investigations.

The final session will guide participants through planning and setting up their own Python projects on which to begin implementing the techniques learned throughout the course. It is strongly recommended that participants have identified analytical projects they would like to apply Python to prior to joining the course.

Participants will also have the opportunity to submit their projects for review and feedback one month after the final session. During this month, it is expected that participants will put aside between 4 and 10 hours to work on their projects before submitting them for feedback.

 

N.B. Python is a great language for writing web scrapers, however, this course will focus on Python’s application to data analysis. If your main objective in learning Python is for scraping, then our specific Web Scraping for Journalists courses would be more suitable.

Technical Requirements

This course will need you to have the following software/apps/tools on your computer:

  • A Google Colab account.
  • Zoom app. During these sessions the trainers often need participants to be able to share their screen in order to solve problems or demonstrate techniques: if you are on a work computer, or other device which has screen sharing on Zoom disabled, please consider getting the restriction lifted for the duration of this course. If you can’t share the screen because the function is blocked or disabled, it makes it much harder to solve problems and learn from them. But, rest assured, nobody will be forced to share their screen against their will.
  • Camera and audio

This course will be hosted on Zoom. To find out more about how we use Zoom, please check out our Zoom InfoSec page.

Final Project
Following the final session, particiants will have three weeks to continue working on their own pythjon project. If the project is submitted by December 1, then they will receive feedback and guidance on further steps by December 8

Project Submission: December 1 2023
Project Feedback Received: December 8 2023

6 November 2023 – Session 1 - Getting started with Python and the set-up

10:00–12:00
Find out what Python can do for your data journalism and learn some of the basics of the technologies we will be using for the course.

7 November 2023 – Session 2 - Working with Data in Python

10:00–12:00
Learn how to read data from a spreadsheet, sort it and filter records.

8 November 2023 – Session 3 - Analysing Data in Python

10:00–12:00
Summarise and aggregate data using groupby functions and pivot tables in Pandas.

9 November 2023 – Session 4 - Transforming and Cleaning Data with Python

10:00–12:00
A closer look at how to handle different types of files in Python and how to export data so that it can be used in other programmes. Introduction to string manipulation and the basics of data cleaning in Python.

10 November 2023 – Session 5 - Starting a Python Project

10:00–12:00
Planning your own Python project and starting work on the Google Colab notebook that will be your final project.

Sam Leon

Sam Leon is co-founder of Data Desk, an investigative consultancy focused on climate and the commodities industry. He previously worked at Global Witness where he ran their digital investigations unit.
  • 6 November 2023 10.00–12.00 GMT (UK Time)
  • 7 November 2023 10.00–12.00 GMT (UK Time)
  • 8 November 2023 10.00–12.00 GMT (UK Time)
  • 9 November 2023 10.00–12.00 GMT (UK Time)
  • 10 November 2023 10.00–12.00 GMT (UK Time)
GMT (UK Time)
Location: Online