Abstract

If you're interested in Hadoop but don't know where to begin, this session will give you an idea of what you can do with the open-source platform. We will see an overview of the Hadoop architecture, becoming familiar with the overall platform and its solutions for warehousing, ETL, streaming data ingest, in-memory processing, and more. We will compare Hadoop to SQL Server to help gain an understanding of when to deploy which technology.


Slides

The slides are available in HTML 5 format. All modern browsers (including tablets and phones) should be able to navigate the slides successfully.

The slides are licensed under Creative Commons Attribution-ShareAlike.


Additional Media

I have a version of this talk on YouTube. You can get the recording on my Youtube channel.


Links And Further Information

Hadoop Distributions

If you want to get started with Hadoop, there are a number of options available to you. The local sandboxes tend to be available as Azure or AWS virtual machines as well, so if you don't have a beefy machine at home, you can still get started pretty easily.

Local sandboxes:

Platform-as-a-Service offerings:

Interesting Links

Learning Resources

Books are hard to recommend because the source material changes so frequently--a book written in 2017 can be out of date by the time it's published in 2018. These are a few books that I have on my to-read list:

Some of the foundational papers do hold up well, as they provide information on the underpinnings of these technologies. Examples include:

I have a few other talks in which I cover elements of Hadoop in detail.

I learned a good deal from the Hortonworks tutorials, which include both written and video tutorials. They are a good place to start.