Reader small image

You're reading from  SQL Server 2017 Machine Learning Services with R.

Product typeBook
Published inFeb 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781787283572
Edition1st Edition
Languages
Right arrow
Authors (2):
Julie Koesmarno
Julie Koesmarno
author image
Julie Koesmarno

Julie Koesmarno is a senior program manager in the Database Systems Business Analytics team, at Microsoft. Currently, she leads big data analytics initiatives, driving business growth and customer success for SQL Server and Azure Data businesses. She has over 10 years of experience in data management, data warehousing, and analytics for multimillion-dollar businesses as a SQL Server developer, a system analyst, and a consultant prior to joining Microsoft. She is passionate about empowering data professionals to drive impacts for customer success and business through insights.
Read more about Julie Koesmarno

Tomaž Kaštrun
Tomaž Kaštrun
author image
Tomaž Kaštrun

Toma Katrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research, and machine learning. He is a Microsoft SQL Server MVP for data platform and has been working with Microsoft SQL Server since version 2000. He is a blogger, author of many articles, a frequent speaker at the community and Microsoft events. He is an avid coffee drinker who is passionate about fixed gear bikes.
Read more about Tomaž Kaštrun

View More author details
Right arrow

Data Exploration and Data Visualization

Data Exploration and Data Visualization techniques are essential to understanding data before one can implement predictive modeling. With existing open source R packages for statistical and mathematical algorithms, data professionals can easily explore their data and discover patterns/trends that are otherwise challenging to do in a relational database. Using SQL Server Machine Learning Services (ML Services) with R means that data exploration and data visualization are no longer siloed work, leading to faster and easier paths to predictive modeling.

This chapter outlines essential tips and tricks developers must know for data exploration and data visualization using R. You will learn how to integrate R for data exploration and data visualization in T-SQL and then stitch these techniques in SSRS and Power BI. If you are already familiar...

Understanding SQL and R data types

Before we dive into exploring data using R in T-SQL, let's get started with understanding data types to store data in R. The first and most important data type to be familiar with when working with R in T-SQL is data frame. The input and output parameters of sp_execute_external_script in SQL Server received and sent from R are data frames. Other data types that are important to know for data munging, and that are very similar to data frame, are matrix and data table, which are beyond the scope of this chapter.

Aside from data frame, R supports a limited number of scalar data types such as character, complex, date/time, integer, logical, numeric, and raw. Thus, when you provide data from SQL Server in R Scripts, when necessary the data will be implicitly converted to a compatible data type in R. When a conversion cannot be performed automatically...

Data exploration and data munging

Data munging in R can be done independently without using SQL Server. The following diagram illustrates a recommended high-level process that developers can follow when using SQL Server and R. If you have worked in R before, you are likely familiar with steps 2, 3, and 5 performed in R directly.

Please note that Steps 3 and 4 are optional and will be discussed more in Chapter 6, Predictive Modeling and Chapter 7, Operationalizing R Code:

Figure 4.2 - High-Level Development Process for SQL Server Machine Learning Services with R

Let's get started with Data Munging in R. Specifically, in this section, we will be working with the R environment so that we know how it works in R before we stitch it together with T-SQL. If you are familiar with R, you may skip this section. Following are prerequisites for this section:

  1. An R IDE, for example RTVS...

Data visualization in R

Good data visualization draws insights from a large amount of data and serves as a medium to communicate to the audience. Fortunately, R has powerful built-in functions as well as packages that can help you to create good data visualization. In this section, we will go through a number of built-in graphical functions and R libraries to show their capabilities. Then we'll walk through an example on how to stitch it together with T-SQL. You will also learn how to display graphics from R in SQL Operations Studio. Similar to the previous section, we will be using the orders dataset and will create a data frame d to narrow down the analysis for sales persons Amy, Jack, and Hudson.

Plot

The plot() function...

Integrating R code in reports and visualizations

In this section, we will delve into familiar reports and visualization Tools that are available in the Microsoft BI stack, such as SQL Server Reporting Services (SSRS), Power BI, and Mobile Reports.

There are three main use cases for integrating R graphics with SQL Server.

  1. Get a dataset output representing data / statistical analysis, training model, or predictive model:
Figure 4-13 SQL Server Machine Learning Services process for data analysis in R

Execute sp_execute_external_script to run R to produce a dataset output as illustrated in (1) + (2) + (3). The data set output (3) could be from data/statistical analysis, a training model, predictive output, and so on. In SQL Server, we can optionally process the output further (4), for example, saving it into a table or passing it on to another stored procedure.

  1. Get a dataset output...

Summary

In this chapter, you have learned the importance of data preparation in predictive modeling, which involves both data exploration and data visualization exercises. R has a number of open-source packages that are useful for data munging, for example dplyr, reshape, and many more. The challenge is to hit the right balance between having data munging activities in SQL Server VS in R. The beauty of SQL Server Machine Learning Services is that it allows easy integration with SQL Server Reporting Services. In addition, Power BI also supports interactive data exploration with R visualizations. In the next chapter, you will learn more about the RevoScaleR library for portable, scalable, and distributable R functions.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
SQL Server 2017 Machine Learning Services with R.
Published in: Feb 2018Publisher: PacktISBN-13: 9781787283572
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Julie Koesmarno

Julie Koesmarno is a senior program manager in the Database Systems Business Analytics team, at Microsoft. Currently, she leads big data analytics initiatives, driving business growth and customer success for SQL Server and Azure Data businesses. She has over 10 years of experience in data management, data warehousing, and analytics for multimillion-dollar businesses as a SQL Server developer, a system analyst, and a consultant prior to joining Microsoft. She is passionate about empowering data professionals to drive impacts for customer success and business through insights.
Read more about Julie Koesmarno

author image
Tomaž Kaštrun

Toma Katrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research, and machine learning. He is a Microsoft SQL Server MVP for data platform and has been working with Microsoft SQL Server since version 2000. He is a blogger, author of many articles, a frequent speaker at the community and Microsoft events. He is an avid coffee drinker who is passionate about fixed gear bikes.
Read more about Tomaž Kaštrun