Microsoft SQL Server 2012 with Hadoop

More Information
  • Use the Native SQOOP Connector for data movement between SQL Server 2012 and Hadoop
  • Configure and use the Hive ODBC driver to enable any ODBC compliant client to consume Hadoop data
  • Create ETL solutions and automate data movement jobs between SQL Server 2012 and Hadoop using SQL Server Integration Services
  • Provide powerful reporting on the integrated data with just a matter of a few clicks using Microsoft self-service BI tools
  • Merge structured and unstructured data together in a common warehouse for analysis, which is essential

With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasingly important because the two are indeed complementary. While petabytes of unstructured data can be stored in Hadoop taking hours to be queried, terabytes of structured data can be stored in SQL Server 2012 and queried in seconds. This leads to the need to transfer and integrate data between Hadoop and SQL Server.

Microsoft SQL Server 2012 with Hadoop is aimed at SQL Server developers. It will quickly show you how to get Hadoop activated on SQL Server 2012 (it ships with this version). Once this is done, the book will focus on how to manage big data with Hadoop and use Hadoop Hive to query the data. It will also cover topics such as using in-memory functions by SQL Server and using tools for BI with big data.

Microsoft SQL Server 2012 with Hadoop focuses on data integration techniques between relational (SQL Server 2012) and non-relational (Hadoop) worlds. It will walk you through different tools for the bi-directional movement of data with practical examples.

You will learn to use open source connectors like SQOOP to import and export data between SQL Server 2012 and Hadoop, and to work with leading in-memory BI tools to create ETL solutions using the Hive ODBC driver for developing your data movement projects. Finally, this book will give you a glimpse of the present day self-service BI tools such as Excel and PowerView to consume Hadoop data and provide powerful insights on the data.

  • Integrate data from unstructured (Hadoop) and structured (SQL Server 2012) sources
  • Configure and install connectors for a bi-directional transfer of data
  • Full of illustrations, diagrams, and tips with clear, step-by-step instructions and practical examples
Page Count 96
Course Length 2 hours 52 minutes
ISBN 9781782177982
Date Of Publication 25 Aug 2013


Debarchan Sarkar

Debarchan Sarkar is a Microsoft Data Platform engineer. He specializes in the Microsoft SQL Server Business Intelligence stack. Debarchan is a subject matter expert in SQL Server Integration Services and delves deep into the open source world, specifically the Apache Hadoop framework. He is currently working on a technology called HDInsight, which is Microsoft's distribution of Hadoop on Windows. He has authored various books on SQL Server and Big Data, including Microsoft SQL Server 2012 with Hadoop, Packt Publishing, and Pro Microsoft HDInsight: Hadoop on Windows, Apress. His Twitter handle is @debarchans.