Here are some resources I have found helpful for learning how to do data sciences. It includes some books, some excellent repositories, and some pieces I’ve written (in the spirit of “if you can’t explain it you don’t understand it”).
Doing Data Science
Python
I started my data sciences journey with R and SAS, but its all about Python now. I like:
- Introduction To Computer Science with Python MITs EdX course
- Python for Data Analysis, by Wes McKinney book and code repository
- Python Data Science Handbook, by Jake VanderPlas book and code repository
Machine Learning
For a deeper understanding of what is going on under the hood:
- The book Introduction to Statistical Learning is just excellent.
Clear explanations of how ML works and practical examples in R. - Statstical Learning in Python repo
Has redone Introduction to Statistical Learning labs and exercises in Python. - Some articles I’ve written:
- How Regression Works- Using Matrix Algebra
A Python notebook explaining how regression works, explained using Matrix Algebra. - How K-means Clustering Works
- How Decision Tree Classification works
- How Random Forest Regression Works
- How Regression Works- Using Matrix Algebra
Getting Things Done in Python - Code Snippets
Data Types and Structures
- Data Types
- Lists (Arrays and Dataframes)
- Dictionaries
- Tuples
- Sets
- Vectors
- Indices
- Matrices
- Comprehensions in Python
Control Flow
- If-else
- For Loops
- While Loops
- Loop control statements
Wrangling Data
- Reading and writing Data
- Missing Data
- Outliers
- Subsetting Data
- Grouping Data
- Transforming columns
- Joining data sets
- Preprocessing categorical data
- Transforming the shape of data
- Normalizating Data
- Discretizing data
- One Hot Encoding
Time Series
- Pandas Date-Times
- Create time lags
- Subsetting date-times
- Autocorrelation
- Partial Autocorrelation
- Cross Correlation
Visualizing Data
- Matplotlib basics
- Panel of plots
- Seaborn basics
- Set styles
Machine Learning
- Set up training and test sets
- Classification in scikit
- Regression methhods
- Clustering