You're reading from SQL Server 2017 Machine Learning Services with R.

Product typeBook

Published inFeb 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781787283572

Edition1st Edition

Languages

SQL

Tools

R Services SQL Server

Concepts

Data Analysis

Authors (2):

Julie Koesmarno

Toma≈æ Ka≈°trun

View More author details

Introduction to R and SQL Server

SQL Server 2016 came with great new features, and among them was R integration into SQL Server, partly with advanced analytics and partly with new programmability capabilities. Microsoft R Services for SQL Server is part of the family of new extensibilities for highly scalable and parallel advanced analytics. R Services allows you to perform advanced analytics (statistical, multivariate statistics, predictive analytics, machine learning, and deep learning) on large quantities of data stored in the database. Microsoft published R Services as part of Microsoft R Server (MRS), which was specially designed for reading data directly from the SQL Server database within the same SQL Server computational context.

We will cover the following aspects in this chapter:

Using R prior to SQL Server 2016
Microsoft's commitment on open source R language
Boosting analytics with SQL Server R integration
Outline of the book

Using R prior to SQL Server 2016

The R language has been in the community since the 90's (even though it was developed a decade before). With its open source GNU license, R gained popularity for its no-fuss installation and ability to evoke any available package for additional statistical learning functions. This was a clear advantage to R as there were not that many statistical programs available on the market in the '80s and '90s; in addition, most of them were not free. The extensibility with emerging new packages for the core R engine gave a broader community and users more and more abilities to use the R language for multiple purposes, in addition to its strong statistical analysis and predictive modeling capabilities.

SQL Server 2005 introduced SQL Server Analysis Services (SSAS) data mining features to be applied against the customer's existing rich data stored in SQL Server and SSAS OLAP cubes. This feature allows users to use Data Mining eXpression (DMX) for creating predictive queries. In the next couple of years, several questions, requests, and ideas emerged on SQL forums, blogs, and community websites regarding additional statistical and predictive methods and approaches.

Back in 2011, I started working on the idea of extending the capabilities of statistical analysis in SQL Server 2008 R2 with the help of open source R language. One reason for that decision was to have flexibility of running statistical analysis (from data provisioning to multivariate analysis) without feeding the data into OLAP cube first, and another reason was more business orientated, with the need to get faster, statistical insights from all the people involved in data preparing, data munging, and data cleaning.

I kicked in and started working on a framework that was based on a combination of T-SQL stored procedure and R package RODBC (https://cran.r-project.org/web/packages/RODBC). The idea was simple; get the transactional or OLAP data, select the columns you want to perform analysis against, and the analysis itself (from simple to predictive analytics, which would stretch beyond SSAS, T-SQL, or CLR capabilities):

Figure 1: Process flow of a framework

The framework was far from simple, and calling the procedure considered calling a mixture of R code, T-SQL select statements, and configurations to your R engine.

The stored procedure with all its parameters looked like this:

EXECUTE AdventureWorks2012.dbo.sp_getStatistics
             @TargetTable = '[vStoreWithAddresses]'
            ,@Variables = 'Name'
            ,@Statistics = '8'
            ,@ServerName = 'WORKSTATION-31'
            ,@DatabaseName = 'AdventureWorks2012'
            ,@WorkingDirectory = 'C:\DataTK'
            ,@RPath = 'C:\Program Files\R\R-3.0.3\bin';

The nuts and bolts explanation is outside the scope of this book and is well-documented at: http://www.sqlservercentral.com/articles/R+Language/106760/.

Looking back on this framework and the feedback from the community and people on forums, it was accepted positively and many commented that they needed something similar for their daily business.

The framework in general had, besides pioneering the idea and bringing R engine one step closer to SQL Server, many flaws. The major one was security. Because it needed access to a working local directory for generating R files to be run by the vanilla R engine, it needed xp_cmdshell enabled. The following reconfiguration was mandatory and many sysadmins would not approve of it:

EXECUTE SP_CONFIGURE 'xp_cmdshell', 1;
GO
RECONFIGURE;
GO
    
EXECUTE SP_CONFIGURE 'Ole Automation Procedures', 1;
GO
RECONFIGURE;
GO

In addition, the framework needed to have access to R engine installation, together with R packages to execute the desired code. Installing open source programs and providing read/write access was again a drawback in terms of security and corporate software decisions. Nevertheless, one of the bigger issues—later when everything was installed and put into production—was performance and memory issues. R is memory-based, meaning all the computations are done in the memory. So, if your dataset is bigger than the size of the available memory, the only result you will get will be error messages. Another aspect of performance issues was also the speed. With no parallel and distributive computations, the framework was bound to dexterity of an author of the package. For example, if the package was written in C or C++, rather than in Fortran, the framework performed better, respectively.

The great part of this framework was the ability to deliver results from statistical analysis or predictive modeling much faster, because it could take OLTP or any other data that needed statistical analysis. Furthermore, statisticians and data scientists could prepare the R code that was stored in the table, which was later run by data wranglers, data analysts, or data stewards. Therefore, one version of truth is maintained, because there was no need for data movement or data copying and all users were reading the same data source. In terms of predictive modeling, the framework also enabled users to take advantage of various additional predictive algorithms (for example, decision forest, glm, CNN, SVM, and word cloud) that were not part of SSAS Data Mining at that time.

Besides the pros and cons, the framework was a successful initial attempt to get more data insights that were easily distributable among different business units through pushing visualizations in SQL Server Reporting Services. In the years prior to the release of SQL Server 2016, I had met people from the SQL Server community that developed similar frameworks, in order to push predictions to the SQL Server database to support business applications and solutions. With SQL Server 2016, many such similar solutions were internalized and brought closer to the SQL Server engine to achieve better performance and to address many of the issues and cons.

Microsoft's commitment to the open source R language

With a growing popularity and community, R has become and continues to be a big player in the field of advanced analytics and data visualization. R and machine learning servers (or services) are not just buzzword that will be forgotten in the next cycle of SQL Server, but it is infiltrating more and more into different layers of open source and corporate software. In the past five years, many big analytical players have introduced R integration, interpreters, and wrappers for the R language, because of the language's practicality, usability, and inter-disciplinarily and open source orientation. As Microsoft's making a bold and strategic move toward being open source friendly, the use cases for integrating R in SQL Server are growing, making this move even more natural and at the right point in time. This move had been very well appreciated in the SQL community and the business as well.

In comparison to other big analytical tools, Microsoft took integration very seriously. It addressed many of the issues and limitations of the language itself, and created complete integration of R with the SQL Server in order to give the best user experience. Many competitors (such as SAS, IBM, SAP, and Oracle) have done similar integration, but failed to take into account many aspects that contribute to a holistic user experience. Microsoft has announced that joining the R consortium will give them the ability to help the development of the R language and to support future development. In addition, Microsoft has created its own package repository called MRAN (from CRAN, where M stands for Microsoft) and is giving support and SLA agreement for R as well, even though the language and engine is based on Open R (a free, open-sourced version). All these steps tell us how dedicated Microsoft is in bringing an open source, statistical and programming language into the SQL Server environment.

We can only expect more R integration into other services. For example, Power BI supports native R visuals (https://powerbi.microsoft.com/en-us/blog/r-powered-custom-visuals) since October 2016, and R language since December 2015. Therefore, I am a strong believer that R will soon be part of the whole SQL Server ecosystem such as SSAS, SSIS, and SSRS natively as well. With Azure Analysis Services, R is again one step closer to analysis services.

Boosting analytics with SQL Server R integration

Data science is in the forefront of the SQL Server and R integration. Every task performed by DBA, sysadmin, the analyst, wrangler, or any other role that is working with SQL server can have these tasks supported with any kind of statistics, data correlation, data analysis, or data prediction. R integration should not be restricted only to the fields of data science. Instead, it should be explored and used in all tasks. DBA can gain from R integration by using switching from monitoring tasks to understanding and predicting what might or will happen next. Likewise, this idea can be applied to sysadmin, data wranglers, and so on. R integration also brings different roles of people closer to understand statistics, metrics, measures, and learn how to improve them by using statistical analysis and predictions.

Besides bringing siloed individual teamwork into more coherent and cohesive teams, R integration also brings less data movement, because different users can now—with the help of R code—execute, drill down, and feel the data, instead of waiting to have data first prepared, exported, and imported again. With smoother workflows comes faster time to deployment, whether it is a simple report, a predictive model, or analysis. This allows the boundaries of data ownership to shift into insights ownership, which is a positive aspect of faster reactions to business needs.

In the past year, we have also seen much more interest in data science in Microsoft stack. With R integration, Azure Machine Learning, and Power BI, all users who want to learn new skills and virtues have great starting points from the available products.

Summary

Starting with SQL Server 2016, R integration became a very important part of the SQL Server platform. Since the public release of SQL server 2016, until February 2018 (the time of writing this), the community had embraced R as well as Python very well, making data exploration and data analysis part of the general database task. Microsoft addressed many of the issues, and broadened the SQL Server as a product. With SQL Server 2017, Python was added as a secondary analytical language, reaching to an even broader community as well as businesses, and at the same time, taking are of data scalability, performance, and security.

In the next chapter, we will cover different R distributions and IDE tools for using R as a standalone or within the SQL Server, and what the differences among them are when deciding which one to choose.

You have been reading a chapter from

SQL Server 2017 Machine Learning Services with R.

Published in: Feb 2018Publisher: PacktISBN-13: 9781787283572

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Julie Koesmarno

Julie Koesmarno is a senior program manager in the Database Systems Business Analytics team, at Microsoft. Currently, she leads big data analytics initiatives, driving business growth and customer success for SQL Server and Azure Data businesses. She has over 10 years of experience in data management, data warehousing, and analytics for multimillion-dollar businesses as a SQL Server developer, a system analyst, and a consultant prior to joining Microsoft. She is passionate about empowering data professionals to drive impacts for customer success and business through insights.
Read more about Julie Koesmarno

Toma≈æ Ka≈°trun

Toma Katrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research, and machine learning. He is a Microsoft SQL Server MVP for data platform and has been working with Microsoft SQL Server since version 2000. He is a blogger, author of many articles, a frequent speaker at the community and Microsoft events. He is an avid coffee drinker who is passionate about fixed gear bikes.
Read more about Toma≈æ Ka≈°trun

Other recommended products

Related to this chapter

Data Science with SQL Server Quick Start Guide

SQL Server started to fully support data science only with its last two editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning Services for their projects, then this is the ideal book for you.

BookAug 2018206 pages

Modern R Programming Cookbook

R is a powerful tool for statistics, graphics, and statistical programming. It is used by tens of thousands of people daily to perform serious statistical analyses. It is a free, open source system whose implementation is the collective accomplishment of many intelligent, hard-working people. There are more than 2,000 available add-ons, and R is a serious rival to all commercial statistical packages. . The objective of this book is to show how to work with different programming aspects of R. The emerging R developers and data science could have very good programming knowledge but might have limited understanding about R syntax and semantics. Our book will be a platform develop practical solution out of real world problem in scalable fashion and with very good understanding.

BookOct 2017236 pages

Hands-On Data Science with SQL Server 2017

Learn how to utilize Microsoft SQL Server with NoSQL concepts for data science challenges. This book will help enhance your knowledge beyond data querying & processing tasks by implementing a data science pipeline. We will implement data science tasks and show how to use them on a day-to-day basis for efficient smart predictive models.

BookNov 2018506 pages

SQL Server 2017 Developer's Guide

This book is your guide to exploring the various developer capabilities offered by SQL Server 2017. Model your data and the complex relationships within it, and integrate SQL Server with R and Python for efficient analytics. The book also covers the performance and troubleshooting aspects to help you develop efficient database applications.

BookMar 2018816 pages

SQL Server 2016 Developer's Guide

This book is designed to get you up to speed with SQL Server 2016, covering the essential concepts and techniques. By the end of this book, you’ll be able to design efficient, high-performance database applications confidently.

BookMar 2017616 pages

Introducing Microsoft SQL Server 2019

Introducing Microsoft SQL Server 2019 takes you through what’s new in SQL Server 2019 and why it matters. After reading this book, you’ll be well placed to explore exactly how you can make MIcrosoft SQL Server 2019 work best for you.

BookApr 2020488 pages

Learn T-SQL Querying

T-SQL is an extension of the SQL language which allows you to tackle advanced querying and query-tuning challenges in SQL Server and Azure SQL Database. This book will be a perfect reference for you to write more efficient T-SQL code to perform simple-to-advanced tasks for data management and data analysis.

BookMay 2019484 pages

SQL Server 2017 Administrator's Guide

This book will give you all the information you need to become an expert database administrator, and master the administrative aspects of SQL Server 2017. From setting up and configuring your SQL Server instance to fine-tuning your database, this extensive guide will teach you the nitty-gritty of SQL Server 2017 administration.

BookDec 2017434 pages

SQL Server 2019 Administrator's Guide

This book will give you all the information you need to become an expert database administrator and master the administrative aspects of SQL Server 2019. From setting up and configuring your SQL Server instance to fine-tuning your database, this extensive guide will teach you the nitty-gritty of SQL Server 2019 administration.

BookSep 2020522 pages

SQL Server 2017 Integration Services Cookbook

SQL Server Integration Services is a tool that facilitates data extraction, consolidation, and loading options (ETL), SQL Server coding enhancements, data warehousing, and customizations. With the help of this book, you’ll gain complete hands-on experience of SSIS 2017’s new features, and design and development improvements including SCD, Profiling, Tuning, and Customizations.

BookJun 2017558 pages

Hands-On Machine Learning with Azure

This book will teach you how advanced machine learning can be performed in the cloud in a very cheap way. You will learn more about Azure ML processes as an enterprise-ready methodology. By the end of this book, you will implement machine learning and artificial intelligence concepts in your model to solve real-world problems.

BookOct 2018340 pages

SQL Server on Linux

Microsoft's launch of SQL Server on Linux has made SQL Server a truly versatile platform across different operating systems and data-types, both on-premise and on-cloud. You will start by understanding how SQL Server can be installed on supported and unsupported Linux distributions. With the help of this book you will be able to setup SQL Server on Linux and understand how SQL Server can be installed and implemented on this open source platform.

BookAug 2017222 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages