Catallaxy Services | @feaselkl |
|
Curated SQL | ||
We Speak Linux |
Azure Machine Learning is a Software as a Service offering on the Microsoft Azure network. It offers a point-and-click interface for building, training, testing, and using machine learning models.
Once you have a model created, you can easily turn it into a web service.
Azure ML has several features which make it a great choice for data scientists:
Azure ML gives you a straightforward way of loading and cleansing data, building training versus test data sets, training models, scoring models, and publishing models as production-ready web services.
Microsoft provides a sample data set called Flight Delays Data. We want to see if we can predict delays based on the data available.
Scroll down for image-heavy demo.
Create a new Blank Experiment.
Get the Flight Delays Data.
Ignore cancelled flights -- use a Split Data component.
Specify columns to exclude--they aren't used in our model.
Separate training from test data using another Split Data component.
Add and train a Logical Regression model.
Score and evaluate the model. Results are...less than good: R^2 practically 0.
Using your own data sets is most of the fun in Azure ML. You can learn how the product works using Microsoft-provided data sets, but to provide business value, you need to be able to import your own data sets.
Fortunately, this is very easy to do.
Scroll down for image-heavy demo.
Create a new dataset from a local file.
Fill in the modal dialog options.
Shortly thereafter, we have a new dataset available for use.
SQL Saturday dataset includes:
Scroll down for image-heavy demo.
Pull in SQL Saturday dataset and project specific columns.
Build the model.
Evaluate the model. R^2 = 0.48, not bad for social science result.
Now that we have a functional model, we want to turn this into a web service. Turning this into a web service will allow us to call the service to predict whether I am likely to speak at future events.
Scroll down for image-heavy demo.
Click the "Set Up Web Service" button and pick Predictive Web service.
Check the predictive experiment tab and then click Deploy Web Service.
Test the web service.
City | Month | Know SC? | Expected | Actual |
Cleveland | 02 | 1 | ~1 | 0.73 |
Baltimore | 08 | 1 | ~0.5 | 0.52 |
Dallas | 05 | 0 | 0 | 0.009 |
Kansas City | 10 | 1 | ~0.5 | 0.67 |
Berlin | 06 | 0 | 0 | 0.01 |
Raleigh | 10 | 1 | ~1 | 0.91 |
Raleigh | 12 | 1 | ~1 | 0.898 |
This comes from a Kaggle competition. The training set includes:
Picking the right model is critical to coming up with something which has predictive value. Here are our modeling considerations:
Given these, we want to choose a two-class model. Choosing a decision tree, forest, or jungle seems to be a good starting point.
Scroll down for image-heavy demo.
Full model:
Pull in Titanic dataset. Data visualization:
Make certain columns categorical data.
Rename columns to make more sense.
Set missing Age values to the median.
Remove rows without Embarked values (2 rows total).
Remove unnecessary columns from model.
Make Survived Categorical + Label.
Compare decision forest options. Ex: few, deep trees.
Compare the models. The best model had AUC of .855, pretty decent.
Bonus work: integrate with R!
R scripting is easy (though no debugger).
This lets us include multiple visualizations.
Azure ML won't make you a data scientist, but it does offer a suite of powerful tools for data specialists.
To learn more, go here: http://CSmore.info/on/azureml
And for help, contact me: feasel@catallaxyservices.com | @feaselkl