Data Cleaning with Pandas 1
Data cleaning can feel more like data penance, but Pandas can ease your pain, allowing you to clean and structure your data with minimal hassle. Jupyter Notebook’s interactive environment helps you keep track of your changes and allows you to explore your data.
Participants can expect to learn how to clean large complicated datasets quickly and learn how to explore data too large for Excel by using the browser based Jupyter Notebook.
Why Python?
Python makes it easy to replicate your analysis at a later stage and reduces the threat of human error that many face in Excel. It’s also shareable within teams and allows you to document and explain your work within the notebook so you can come back to it later and easily pick up from where you left off.
There are no upper limits in terms of data size, you can use Python on a csv with 10 rows or a billion. You get to a point where the limitation is the speed of the RAM on your machine, at which point you need to switch to a server.
Technical Requirements
Participants should have previous experience of coding at a basic level or more.