Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
SQL Server 2017 Machine Learning Services with R.
SQL Server 2017 Machine Learning Services with R.

SQL Server 2017 Machine Learning Services with R.: Data exploration, modeling, and advanced analytics

By Julie Koesmarno , Tomaž Kaštrun
S$47.99 S$32.99
Book Feb 2018 338 pages 1st Edition
eBook
S$47.99 S$32.99
Print
S$59.99
Subscription
Free Trial
eBook
S$47.99 S$32.99
Print
S$59.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Feb 27, 2018
Length 338 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781787283572
Vendor :
Oracle
Category :
Concepts :
Table of content icon View table of contents Preview book icon Preview Book

SQL Server 2017 Machine Learning Services with R.

Introduction to R and SQL Server

SQL Server 2016 came with great new features, and among them was R integration into SQL Server, partly with advanced analytics and partly with new programmability capabilities. Microsoft R Services for SQL Server is part of the family of new extensibilities for highly scalable and parallel advanced analytics. R Services allows you to perform advanced analytics (statistical, multivariate statistics, predictive analytics, machine learning, and deep learning) on large quantities of data stored in the database. Microsoft published R Services as part of Microsoft R Server (MRS), which was specially designed for reading data directly from the SQL Server database within the same SQL Server computational context.

We will cover the following aspects in this chapter:

  • Using R prior to SQL Server 2016
  • Microsoft's commitment on open source R language
  • Boosting analytics with SQL Server R integration
  • Outline of the book

Using R prior to SQL Server 2016

The R language has been in the community since the 90's (even though it was developed a decade before). With its open source GNU license, R gained popularity for its no-fuss installation and ability to evoke any available package for additional statistical learning functions. This was a clear advantage to R as there were not that many statistical programs available on the market in the '80s and '90s; in addition, most of them were not free. The extensibility with emerging new packages for the core R engine gave a broader community and users more and more abilities to use the R language for multiple purposes, in addition to its strong statistical analysis and predictive modeling capabilities.

SQL Server 2005 introduced SQL Server Analysis Services (SSAS) data mining features to be applied against the customer's existing rich data stored in SQL Server and SSAS OLAP cubes. This feature allows users to use Data Mining eXpression (DMX) for creating predictive queries. In the next couple of years, several questions, requests, and ideas emerged on SQL forums, blogs, and community websites regarding additional statistical and predictive methods and approaches.

Back in 2011, I started working on the idea of extending the capabilities of statistical analysis in SQL Server 2008 R2 with the help of open source R language. One reason for that decision was to have flexibility of running statistical analysis (from data provisioning to multivariate analysis) without feeding the data into OLAP cube first, and another reason was more business orientated, with the need to get faster, statistical insights from all the people involved in data preparing, data munging, and data cleaning.

I kicked in and started working on a framework that was based on a combination of T-SQL stored procedure and R package RODBC (https://cran.r-project.org/web/packages/RODBC). The idea was simple; get the transactional or OLAP data, select the columns you want to perform analysis against, and the analysis itself (from simple to predictive analytics, which would stretch beyond SSAS, T-SQL, or CLR capabilities):

Figure 1: Process flow of a framework

The framework was far from simple, and calling the procedure considered calling a mixture of R code, T-SQL select statements, and configurations to your R engine.

The stored procedure with all its parameters looked like this:

EXECUTE AdventureWorks2012.dbo.sp_getStatistics
             @TargetTable = '[vStoreWithAddresses]'
            ,@Variables = 'Name'
            ,@Statistics = '8'
            ,@ServerName = 'WORKSTATION-31'
            ,@DatabaseName = 'AdventureWorks2012'
            ,@WorkingDirectory = 'C:\DataTK'
            ,@RPath = 'C:\Program Files\R\R-3.0.3\bin'; 

The nuts and bolts explanation is outside the scope of this book and is well-documented at: http://www.sqlservercentral.com/articles/R+Language/106760/.

Looking back on this framework and the feedback from the community and people on forums, it was accepted positively and many commented that they needed something similar for their daily business.

The framework in general had, besides pioneering the idea and bringing R engine one step closer to SQL Server, many flaws. The major one was security. Because it needed access to a working local directory for generating R files to be run by the vanilla R engine, it needed xp_cmdshell enabled. The following reconfiguration was mandatory and many sysadmins would not approve of it:

EXECUTE SP_CONFIGURE 'xp_cmdshell', 1;
GO
RECONFIGURE;
GO
    
EXECUTE SP_CONFIGURE 'Ole Automation Procedures', 1;
GO
RECONFIGURE;
GO  

In addition, the framework needed to have access to R engine installation, together with R packages to execute the desired code. Installing open source programs and providing read/write access was again a drawback in terms of security and corporate software decisions. Nevertheless, one of the bigger issues—later when everything was installed and put into production—was performance and memory issues. R is memory-based, meaning all the computations are done in the memory. So, if your dataset is bigger than the size of the available memory, the only result you will get will be error messages. Another aspect of performance issues was also the speed. With no parallel and distributive computations, the framework was bound to dexterity of an author of the package. For example, if the package was written in C or C++, rather than in Fortran, the framework performed better, respectively.

The great part of this framework was the ability to deliver results from statistical analysis or predictive modeling much faster, because it could take OLTP or any other data that needed statistical analysis. Furthermore, statisticians and data scientists could prepare the R code that was stored in the table, which was later run by data wranglers, data analysts, or data stewards. Therefore, one version of truth is maintained, because there was no need for data movement or data copying and all users were reading the same data source. In terms of predictive modeling, the framework also enabled users to take advantage of various additional predictive algorithms (for example, decision forest, glm, CNN, SVM, and word cloud) that were not part of SSAS Data Mining at that time.

Besides the pros and cons, the framework was a successful initial attempt to get more data insights that were easily distributable among different business units through pushing visualizations in SQL Server Reporting Services. In the years prior to the release of SQL Server 2016, I had met people from the SQL Server community that developed similar frameworks, in order to push predictions to the SQL Server database to support business applications and solutions. With SQL Server 2016, many such similar solutions were internalized and brought closer to the SQL Server engine to achieve better performance and to address many of the issues and cons.

Microsoft's commitment to the open source R language

With a growing popularity and community, R has become and continues to be a big player in the field of advanced analytics and data visualization. R and machine learning servers (or services) are not just buzzword that will be forgotten in the next cycle of SQL Server, but it is infiltrating more and more into different layers of open source and corporate software. In the past five years, many big analytical players have introduced R integration, interpreters, and wrappers for the R language, because of the language's practicality, usability, and inter-disciplinarily and open source orientation. As Microsoft's making a bold and strategic move toward being open source friendly, the use cases for integrating R in SQL Server are growing, making this move even more natural and at the right point in time. This move had been very well appreciated in the SQL community and the business as well.

In comparison to other big analytical tools, Microsoft took integration very seriously. It addressed many of the issues and limitations of the language itself, and created complete integration of R with the SQL Server in order to give the best user experience. Many competitors (such as SAS, IBM, SAP, and Oracle) have done similar integration, but failed to take into account many aspects that contribute to a holistic user experience. Microsoft has announced that joining the R consortium will give them the ability to help the development of the R language and to support future development. In addition, Microsoft has created its own package repository called MRAN (from CRAN, where M stands for Microsoft) and is giving support and SLA agreement for R as well, even though the language and engine is based on Open R (a free, open-sourced version). All these steps tell us how dedicated Microsoft is in bringing an open source, statistical and programming language into the SQL Server environment.

We can only expect more R integration into other services. For example, Power BI supports native R visuals (https://powerbi.microsoft.com/en-us/blog/r-powered-custom-visuals) since October 2016, and R language since December 2015. Therefore, I am a strong believer that R will soon be part of the whole SQL Server ecosystem such as SSAS, SSIS, and SSRS natively as well. With Azure Analysis Services, R is again one step closer to analysis services.

Boosting analytics with SQL Server R integration

Data science is in the forefront of the SQL Server and R integration. Every task performed by DBA, sysadmin, the analyst, wrangler, or any other role that is working with SQL server can have these tasks supported with any kind of statistics, data correlation, data analysis, or data prediction. R integration should not be restricted only to the fields of data science. Instead, it should be explored and used in all tasks. DBA can gain from R integration by using switching from monitoring tasks to understanding and predicting what might or will happen next. Likewise, this idea can be applied to sysadmin, data wranglers, and so on. R integration also brings different roles of people closer to understand statistics, metrics, measures, and learn how to improve them by using statistical analysis and predictions.

Besides bringing siloed individual teamwork into more coherent and cohesive teams, R integration also brings less data movement, because different users can now—with the help of R code—execute, drill down, and feel the data, instead of waiting to have data first prepared, exported, and imported again. With smoother workflows comes faster time to deployment, whether it is a simple report, a predictive model, or analysis. This allows the boundaries of data ownership to shift into insights ownership, which is a positive aspect of faster reactions to business needs.

In the past year, we have also seen much more interest in data science in Microsoft stack. With R integration, Azure Machine Learning, and Power BI, all users who want to learn new skills and virtues have great starting points from the available products.

Summary

Starting with SQL Server 2016, R integration became a very important part of the SQL Server platform. Since the public release of SQL server 2016, until February 2018 (the time of writing this), the community had embraced R as well as Python very well, making data exploration and data analysis part of the general database task. Microsoft addressed many of the issues, and broadened the SQL Server as a product. With SQL Server 2017, Python was added as a secondary analytical language, reaching to an even broader community as well as businesses, and at the same time, taking are of data scalability, performance, and security.

In the next chapter, we will cover different R distributions and IDE tools for using R as a standalone or within the SQL Server, and what the differences among them are when deciding which one to choose.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Learn how you can combine the power of R and SQL Server 2017 to build efficient, cost-effective data science solutions
  • Leverage the capabilities of R Services to perform advanced analytics—from data exploration to predictive modeling
  • A quick primer with practical examples to help you get up- and- running with SQL Server 2017 Machine Learning Services with R, as part of database solutions with continuous integration / continuous delivery.

Description

R Services was one of the most anticipated features in SQL Server 2016, improved significantly and rebranded as SQL Server 2017 Machine Learning Services. Prior to SQL Server 2016, many developers and data scientists were already using R to connect to SQL Server in siloed environments that left a lot to be desired, in order to do additional data analysis, superseding SSAS Data Mining or additional CLR programming functions. With R integrated within SQL Server 2017, these developers and data scientists can now benefit from its integrated, effective, efficient, and more streamlined analytics environment. This book gives you foundational knowledge and insights to help you understand SQL Server 2017 Machine Learning Services with R. First and foremost, the book provides practical examples on how to implement, use, and understand SQL Server and R integration in corporate environments, and also provides explanations and underlying motivations. It covers installing Machine Learning Services;maintaining, deploying, and managing code;and monitoring your services. Delving more deeply into predictive modeling and the RevoScaleR package, this book also provides insights into operationalizing code and exploring and visualizing data. To complete the journey, this book covers the new features in SQL Server 2017 and how they are compatible with R, amplifying their combined power.

What you will learn

Get an overview of SQL Server 2017 Machine Learning Services with R Manage SQL Server Machine Learning Services from installation to configuration and maintenance Handle and operationalize R code Explore RevoScaleR R algorithms and create predictive models Deploy, manage, and monitor database solutions with R Extend R with SQL Server 2017 features Explore the power of R for database administrators

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Feb 27, 2018
Length 338 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781787283572
Vendor :
Oracle
Category :
Concepts :

Table of Contents

12 Chapters
Preface Chevron down icon Chevron up icon
Introduction to R and SQL Server Chevron down icon Chevron up icon
Overview of Microsoft Machine Learning Server and SQL Server Chevron down icon Chevron up icon
Managing Machine Learning Services for SQL Server 2017 and R Chevron down icon Chevron up icon
Data Exploration and Data Visualization Chevron down icon Chevron up icon
RevoScaleR Package Chevron down icon Chevron up icon
Predictive Modeling Chevron down icon Chevron up icon
Operationalizing R Code Chevron down icon Chevron up icon
Deploying, Managing, and Monitoring Database Solutions containing R Code Chevron down icon Chevron up icon
Machine Learning Services with R for DBAs Chevron down icon Chevron up icon
R and SQL Server 2016/2017 Features Extended Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.