R is a language focused around performing statistical analysis, predictive modeling, and data cleansing.
R is an off-shoot of the S language and is built on top of C.
As a data analysis Domain Specific Language (DSL):
R helps you go well beyond simple Excel analysis and pivot tables.
My goals in this talk:
Note that R is not the only data analysis language you could learn. Julia and Python are also great languages, and there are very good closed-source, commercial tools like SAS.
There are two major branches of R of interest to us: base R and Microsoft R. "Base" R is managed by the R Consortium and is entirely open-source. Microsoft takes base R and adds additional libraries and support.
There is one big IDE available: RStudio. RStudio is a standalone installation and provides a nice development interface for R.
Microsoft had also made available R Tools for Visual Studio (RTVS), a Visual Studio plug-in. It offers some interesting features like making SQL Server R Services integration easier, and it integrates with other Visual Studio projects. It was built into Visual Studio 2017 but removed from SQL Server 2019.
We will also install Jupyer Notebooks and use it during this talk. Installing Jupyter takes a few steps, but the links for this talk include a step-by-step walkthrough. The easiest way to install Jupyter is to use Anaconda, a data science suite for Python.
Jupyter (which name derives from a combination of the languages Julia, Python, and R) is a great framework because it has support for dozens of languages. Microsoft uses Jupyter Notebooks for its Azure Machine Learning products.
Notebooks are a way of mixing Markdown-enabled text and language snippets to make your thoughts clear to others. You can create and share notebooks, allowing others easily to test your process and follow along. Notebooks are also an excellent teaching mechanism.
Connecting to a SQL Server database (or any other relational database) is easy with R. The first step is to install the RODBC or DBI pacakage to give your R code ODBC support. From there, you can connect to a system data source that you've defined in your ODBC Data Sources.
You could also pass in a connection string if you don't want to set up a DSN.
No single talk will expose the full gamut of what you can do with R, but this next section will try to hit a few of the highlights. If this feels a bit overwhelming, don't fret: you can grab the notebook and try it out yourself.
This notebook will cover the analysis of restaurant data for Wake County, North Carolina over a multi-year period.
R is a powerful language for performing analysis. We've seen just a few of the many valuable uses of R.
To learn more, go here:
Catallaxy Services consulting: