I’ve finally gotten around to completing the Microsoft Professional Program in Data Science, which I started nearly a year ago. It’s a pretty comprehensive sequence of courses that gives a solid grounding in (and/or revision of!):
- Probability and Statistics (the heart of all of this)
- Programming in Python and/or R
- Importing and cleansing various types of data from different sources
- Visualising data (including timeseries and spatial)
- Machine Learning (regression, classification and clustering)
… and shows how they all fit together into a “big picture”. Obviously the course is run by Microsoft via edX and does make use of some Microsoft technologies such as Azure ML Studio but it is not actually particularly Microsoft-centric. The maths is universal and most of the programming is in open-source languages, for example I completed the final Capstone project with the free RStudio on my late-2008 MacBook Pro (achieving a final score of 97%).
So I definitely recommend this course (and it’s free if you don’t care about getting a cert at the end, and doesn’t require owning any high-end hardware, all you need is time and self-discipline). I think there is a lot of data science hype around right now, and a lot of unrealistic expectations both from data scientists and organisations employing them. I am certainly not planning on any abrupt career changes myself! But when the smoke clears and the dust settles, these kinds of skills will be applicable to all industries and most roles, even if the job title isn’t Official Data Scientist. Data munging/wrangling (or “ETL” to use the fancy term) is something I’ve done my entire career for example, but I haven’t previously done much dimensionality reduction or feature engineering, and I do forecasts of things all the time, so I will be looking to apply some of that perhaps.
Next I think I will do the recently-launched MPP in Artificial Intelligence.
These violent delights have violent ends.