Reader small image

You're reading from  SQL Server 2017 Machine Learning Services with R.

Product typeBook
Published inFeb 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781787283572
Edition1st Edition
Languages
Right arrow
Authors (2):
Julie Koesmarno
Julie Koesmarno
author image
Julie Koesmarno

Julie Koesmarno is a senior program manager in the Database Systems Business Analytics team, at Microsoft. Currently, she leads big data analytics initiatives, driving business growth and customer success for SQL Server and Azure Data businesses. She has over 10 years of experience in data management, data warehousing, and analytics for multimillion-dollar businesses as a SQL Server developer, a system analyst, and a consultant prior to joining Microsoft. She is passionate about empowering data professionals to drive impacts for customer success and business through insights.
Read more about Julie Koesmarno

Tomaž Kaštrun
Tomaž Kaštrun
author image
Tomaž Kaštrun

Toma Katrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research, and machine learning. He is a Microsoft SQL Server MVP for data platform and has been working with Microsoft SQL Server since version 2000. He is a blogger, author of many articles, a frequent speaker at the community and Microsoft events. He is an avid coffee drinker who is passionate about fixed gear bikes.
Read more about Tomaž Kaštrun

View More author details
Right arrow

Machine Learning Services with R for DBAs

R integration (along with Python integration in SQL Server 2017) offered a wide range of possibilities that one can use. And the targeted group of people has just increased in terms of people (job roles or departments) using R Services. DBAs (and also SysAdmins) will for sure gain a lot from this. Not only do R and statistics give them some additional impetus for discovering and gaining insights on their captured data, but also they might help them to find some hidden nuggets that they might have missed before. The mixture of different languages-and I am not solely talking about R, but also other languages-for sure bring new abilities to track, capture, and analyze captured data.

One thing is clear, if you have R (any Python) so close to the database, several people can switch from monitoring tasks to predicting tasks. This literally means...

Gathering relevant data

Gathering data - simple as it might be - is a task that needs to be well crafted. There are a few reasons for that. The first and most important is that we want to gather data in a way that will have minimum or zero impact on the production environment. This means that the process of collecting and storing data should not disturb any on-going process. The second important thing is storage. Where and how do you want to store the data and the retention policy of the stored data? At the beginning, this might seem a very trivial case, but over time, storage itself will play an important role. The third and also utterly important thing is which data you want to gather. Of course, we all want to have smart data present, that is, having all the data relevant for solving or improving our business processes. But in reality, gathering smart is neither that difficult...

Exploring and analyzing data

In a similar way, gathering data using event features can give you a rich way to a lot of system information data. Deriving from the previous sample, with the following demo, we will see how measures of a server can be used for advanced statistical analyses and how to help reduce the amount of different information, and pin-point the relevant measures. A specific database and a stage table will be created:

CREATE DATABASE ServerInfo;
GO
    
USE [ServerInfo]
GO
    
DROP TABLE IF EXISTS server_info;
GO
    
CREATE TABLE [dbo].[server_info]([XE01] [tinyint] NULL, [XE02] [tinyint] NULL,
      [XE03] [tinyint] NULL, [XE04] [tinyint] NULL, [XE05] [tinyint] NULL,
      [XE06] [tinyint] NULL, [XE07] [tinyint] NULL, [XE08] [tinyint] NULL,
      [XE09] [tinyint] NULL, [XE10] [tinyint] NULL, [XE11] [tinyint] NULL,
      [XE12] [tinyint] NULL, [XE13] [tinyint...

Creating a baseline and workloads, and replaying

Given the ability to reduce and create new measures that are tailored and adapted to your particular server or environment, now we want to understand how the system is behaving with all the other parameters unchanged (in Latin, ceteris paribus). This is the baseline. And with the baseline, we establish what is normal, or in other words, what the performance is under normal conditions. A baseline is used for comparing what might be or seem abnormal or out of the ordinary. It can also serve as a control group for any future tests (this works well especially when new patches are rolled out an upgrade of a particular environment/server needs to be performed).

A typical corporate baseline would be described as follows over a period of one day (24 hours) in the form of the number of database requests from users or machines:

When all...

Creating predictions with R - disk usage

Predictions involve spotting any unplanned and unwanted activities or unusual system behavior, especially when compared it to the baseline. In this manner, raising a red flag would result in fewer false positive states.

In addition, we always come across disk-size problems. Based on this problem, we will demo database growth, store the data, and then run predictions against the collected data to be able at the end to predict when a DBA can expect disk space problems.

To illustrate this scenario, I will create a small database of 8 MB and no possibility of growth. I will create two tables. One will serve as a baseline, DataPack_Info_SMALL, and the other will serve as a so-called everyday log, where everything will be stored for unexpected cases or undesired behavior. This will persist in the DataPack_Info_LARGE table.

First, create a database...

Summary

Using SQL Server R for any kind of DBA task, as we have seen here, it is not always hardcore statistics or predictive analytics; we might also be some simple statistical understanding underlying the connection and relationships between the attribute's queries, gathered statistics, and indexes. Prognosing and predicting, for example, information from execution plans in order to prepare a better understanding of the query of cover missing index, is a crucial point. Parameter sniffing or a cardinality estimator would also be a great task to tackle along the usual statistics.

But we have seen that predicting events that are usually only monitored can be a huge advantage for a DBA and a very welcome feature for core systems.

With R integration into SQL Server, such daily, weekly, or monthly tasks can be automated to different, before not uses yet, extent. And as such,...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
SQL Server 2017 Machine Learning Services with R.
Published in: Feb 2018Publisher: PacktISBN-13: 9781787283572
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Julie Koesmarno

Julie Koesmarno is a senior program manager in the Database Systems Business Analytics team, at Microsoft. Currently, she leads big data analytics initiatives, driving business growth and customer success for SQL Server and Azure Data businesses. She has over 10 years of experience in data management, data warehousing, and analytics for multimillion-dollar businesses as a SQL Server developer, a system analyst, and a consultant prior to joining Microsoft. She is passionate about empowering data professionals to drive impacts for customer success and business through insights.
Read more about Julie Koesmarno

author image
Tomaž Kaštrun

Toma Katrun is a SQL Server developer and data scientist with more than 15 years of experience in the fields of business warehousing, development, ETL, database administration, and query tuning. He holds over 15 years of experience in data analysis, data mining, statistical research, and machine learning. He is a Microsoft SQL Server MVP for data platform and has been working with Microsoft SQL Server since version 2000. He is a blogger, author of many articles, a frequent speaker at the community and Microsoft events. He is an avid coffee drinker who is passionate about fixed gear bikes.
Read more about Tomaž Kaštrun