There's an old adage in software development: Garbage In, Garbage Out. This adage certainly applies to data science projects: if you simply throw raw data at models, you will end up with garbage results. In this session, we will build an understanding of just what it takes to implement a data science project whose results are not garbage. We will the Microsoft Team Data Science Process as our model for project implementation, learning what each step of the process entails. To motivate this walkthrough, we will see what we can learn from a survey of data professionals' salaries.


The slides are available as a GitPitch slide deck.

You can also get a version of the slides in HTML 5 format. All modern browsers (including tablets and phones) should be able to navigate the slides successfully.

The slides are licensed under Creative Commons Attribution-ShareAlike.

Demo Code

The demonstration code is available on my GitHub repository. This includes a Jupyter notebook which walks through our example.

The source code is licensed under the terms offered by the GPL. The slides are licensed under Creative Commons Attribution-ShareAlike.

Additional Media

I performed a version of this talk for DataPlatformGeeks. You can get the recording on their Youtube channel.

Links And Further Information

For a more detailed explanation, check out my blog series entitled Launching A Data Science Project, where I cover the topic in this talk in more detail.

Setup Resources

I use the following in this talk: