Web Scraping for Journalists – Online (PM)

This three-part workshop in web scraping is designed for reporters with no knowledge of scraping or programming and provides foundational skills for getting original stories by compiling data across a range of online sources. By the end of the workshop, you will be able to begin to write your own basic scrapers and identify potential targets for scraping. You will also be able to communicate with programmers on relevant projects.

Scraping is the process of automatically collating information from the web. It might be grabbing entries across hundreds of webpages, fetching and combining dozens of spreadsheets, or thousands of PDFs.

The results have led to exclusive stories for organisations ranging from the Bureau of Investigative Journalism and Trinity Mirror, to DC Thomson, Channel 4 and the BBC.

Technical Requirements

This course will need you to have the following software/apps/tools on your computer:

Delegates will be using their own laptop and should have a Google Drive account.
Zoom app
Camera and audio

This course will be hosted on Zoom. To find out more about how we use Zoom, please check out our Zoom InfoSec page.

Course Structure

The course will take place over three online sessions of 2hrs, taking place on consecutive Wednesday evenings. Exercises and additional tasks will be provided to supplement the training during the intervening time and participants will be expected to commit to around 3hrs of self-directed learning between each session.

Important

Our training is not recorded: if you miss a session, it is lost – you cannot watch a recording of it, nor will you be allowed to attend that session at a later date.

22 May 2024 – Session 1: Introduction to scraping in Python

17:00–19:00

This session explores the different roles that scraping can play in a story, introduces Google Colab as a platform for writing and running Python code, and explains key coding concepts that you will need to write scrapers: variables, printing, comments, lists, looping and indices.

29 May 2024 – Session 2: Using libraries for scraping

17:00–19:00

Find, import and put to use libraries that others have created to help you perform particular scraping functions.

5 June 2024 – Session 3: Creating functions for scraping

17:00–19:00

Learn how to create your own functions with Python so that you can solve common issues, such as scraping multiple web pages

Paul Bradshaw

Professor Paul Bradshaw is an online journalist and blogger, who leads the MA in Data Journalism at Birmingham City University. He manages his own blog, the Online Journalism Blog (OJB), and was the co-founder of Help Me Investigate, an investigative journalism website funded by Channel 4 and Screen WM.

22 May 2024 17.00–19.00 Timezone: BST (UK Time)
29 May 2024 17.00–19.00 Timezone: BST (UK Time)
5 June 2024 17.00–19.00 Timezone: BST (UK Time)

Timezone: BST (UK Time)

Location: Zoom meeting

This event is now fully booked.

Goldsmiths students (full time)*

£77

Students (full time)*

£83

Freelancers**

£138

Small Media/Education/NonProfit Organisations (<10 staff)

£230

Large Media/Education/NonProfit Organisations (10+ staff)

£340

Other Organisations

£635

In line with our non-profit mission, our pricing operates on a sliding scale, ensuring large organisations pay more to subsidise places for smaller newsrooms, freelancers and students.

*Student places for this course are capped, due to limited capacity. Anyone registering as a student will be asked for a photo/scan of their student ID ahead of the course.

**Employed individuals who cannot have their employers pay for the course are entitled to the freelancer rate. Note that we are a small charity and rely on your honesty so please do not register as a freelancer if your employer is reimbursing you for the course.

We have a strict policy of No Refund and No Transfer of bookings.