mattconners.github.io

Doing data science - some technical learning resources

Here are some resources I have found helpful for learning how to do data sciences. It includes some books, some excellent repositories, and some pieces I’ve written (in the spirit of “if you can’t explain it you don’t understand it”).

Doing Data Science

Python

I started my data sciences journey with R and SAS, but its all about Python now. I like:

Machine Learning

For a deeper understanding of what is going on under the hood:

Getting Things Done in Python - Code Snippets

Data Types and Structures


  • Data Types
  • Lists (Arrays and Dataframes)
  • Dictionaries
  • Tuples
  • Sets
  • Vectors
  • Indices
  • Matrices
  • Comprehensions in Python

Control Flow


  • If-else
  • For Loops
  • While Loops
  • Loop control statements

Wrangling Data


  • Preprocessing categorical data
  • Transforming the shape of data
  • Normalizating Data
  • Discretizing data
  • One Hot Encoding


Time Series


  • Pandas Date-Times
  • Create time lags
  • Subsetting date-times
  • Autocorrelation
  • Partial Autocorrelation
  • Cross Correlation

Visualizing Data


  • Matplotlib basics
  • Panel of plots
  • Seaborn basics
  • Set styles

Machine Learning


  • Set up training and test sets
  • Classification in scikit
  • Regression methhods
  • Clustering