Programming for Data Science
In the last decade the demand for programming skills related to managing and visualising data has grown remarkably. Python, R and SQL feature consistently in the top skills listed in data science and data analyst jobs. Knowing how to write efficient software code to handle and visualise data is an essential skill for any modern data scientist.
This course will cover the main principles of computer programming with a focus on data science applications by following the entire pathway from raw data to databases, data wrangling and visualisation, machine learning frameworks up to software development. Students will gain knowledge on the main principles of programming in the data science context and develop the ability to handle and visualise data. This course assumes no prior programming knowledge and will provide training in state-of-the-art tools, e.g. SQL, Python, R and Git. Students will apply computational thinking in various applications domains and learn to communicate data analysis results to stakeholders.
If you complete the course successfully, you should be able to:
- convert raw data to relational databases such as SQL
- import data to Python and R, apply data manipulation and visualisation
- program in Python and R
- develop software using version control via Git
- McKinney W. Python for Data Analysis, 2nd edition O’Reilly (2017)
- Gutagg J.V. Introduction to Computation and Programming using Python, MIT Press, 2nd edition (2017)
- Wickham H. and Grolemund G. R for Data Science, 1st edition O’Reilly (2017)
- Wickham H. Advanced R., 1st edition Chapman & Hall (2015)