Data Science Coding Resources

Intended Audience: No background in the language and looking to become “literate”. This doesn’t mean that you’ll get a job writing code in this language. Instead, the goal is to make you comfortable enough in the language so you can talk about it intelligently.

Python

MIT Introduction to Computer Science and Programming Using Python: This was the first online course I took and used a unique strategy to get the most of it. Instead of going through the lectures, I looked directly at the weekly projects and tried to solve them using Google and the book Learn Python the Hard Way (linked below). If your goal is only to learn Python and not the computer science concepts or the Data Science material, I think it’s the way to go. It also gives you real insights into how coding actually takes place.

Learn Python the Hard Way: This book costs $29 but is worth every penny (you can also probably find some PDFs on the internet). It’s

Pandas, NumPy, SciPy: These are libraries that you can import into Python to do common data manipulation and statistics jobs. I’ve linked my main reference for each. These are difficult to learn in a linear way. Instead, you figure out the capabilities as you need them.

[Advanced] Machine Learning for Coders: The course is based in Python, but I won’t recommend it if you haven’t had an introduction to Pandas, NumPy, or SciPy. However, once you have those fundamentals done, this really takes you to the next level.

SQL

Intro to SQL for Data Science by SQL: This is a good intro to SQL. It’s also fairly short and you can easily get through it in an afternoon.

SQL – Full course for beginners by freecodecamp: This YouTube video is more than four hours long and really gets into the details of how to use keys, indexes, complex joins, etc. This makes you fully literate in SQL and should be enough to get you ready for any pet project.

R

Learn R by Codeacademy: This is a good introduction to data cleaning and basic analysis in R. Everything in the course is in-console in the web browser but I would encourage you to replicate in RStudio on your computer as it’ll teach you some of nuances of how to do file management for R.

Introduction to Statistical Learning with Applications in R: This is OG textbook for learning R in the context of data science. If you can make it through the lab exercises at the end of every chapter in the book, you should have a better grasp on R and Data Science than 95% of people in the field.

Datasets

The best way to learn a language is by applying it to data that you’re interested in finding insights from. Here are a few places to find them:

Kaggle’s Public Dataset: This is a good starting option as it has categories and top-rated datasets on any topic you are possibly interested in.

UCI Machine Learning Repository: A scientific paper that uses the dataset is accompanied by the dataset itself so you can try to replicate/improve the analysis done in the paper. Also categorized by what methods were used.

Google’s Dataset Search: Still in beta, but very high quality datasets. Likely to become a lot better in the future.