Reader small image

You're reading from  Hands-On Machine Learning with Azure

Product typeBook
Published inOct 2018
PublisherPackt
ISBN-139781789131956
Edition1st Edition
Tools
Right arrow
Authors (5):
Thomas K Abraham
Thomas K Abraham
author image
Thomas K Abraham

Dr. Thomas K Abraham is a cloud solution architect (advanced analytics and AI) at Microsoft in the South Central Region of the USA. Since January 2016, he's been assisting organizations in leveraging technologies such as SQL, Spark, Hadoop, NoSQL, BI, and AI on Azure. Prior to that, Thomas spent 10 years in Ecolab, where he designed algorithms for IoT devices and built solutions for anomaly detection. In the oil and gas division, he designed and built customer-facing analytics solutions for multiple super majors. His work was focused on preventing equipment failure by modeling corrosion, scale, and other stresses. He has a PhD in Chemical Engineering from The Ohio State University in 2005. His thesis focused on the use of nonlinear optimization with reaction models.
Read more about Thomas K Abraham

Parashar Shah
Parashar Shah
author image
Parashar Shah

Parashar Shah is a Senior Program Manager in the Azure Machine Learning platform team.Currently, he works on making Azure Machine Learning services the best place to do e2e machine learning for building custom AI solutions using big data. Previously at Microsoft, he has been a Data Scientist and a Data Solutions Architect in various Cloud and AI teams. Prior to joining Microsoft, Parashar worked at Nokia Networks as a Solutions Architect & Product Manager building customer experience analytics solutions for global telcos. He also co-founded a carpooling startup, which helped employees carpool safely. He has 10+ years of global work experience. He is an alum of Indian Institute of Management, Bangalore and Gujarat University.
Read more about Parashar Shah

Jen Stirrup
Jen Stirrup
author image
Jen Stirrup

Jen Stirrup is a data strategist and technologist, a Microsoft Most Valuable Professional (MVP), and a Microsoft Regional Director, a tech community advocate, a public speaker and blogger, a published author, and a keynote speaker. Jen is the founder of a boutique consultancy based in the UK, Data Relish, which focuses on delivering successful business intelligence and artificial intelligence solutions that add real value to customers worldwide. She has featured on the BBC as a guest expert on topics relating to data.
Read more about Jen Stirrup

Lauri Lehman
Lauri Lehman
author image
Lauri Lehman

Lauri Lehman is a data scientist who is focused on machine learning tools in Azure. He helps customers to design and implement machine learning solutions in the cloud. He works for the software consultancy company, Zure, based in Helsinki, Finland. For the past 4 years, Lauri has specialized in data and machine learning in Azure. He has worked on many machine learning projects, developing solutions for demand estimation, text analytics, and image recognition, for example. Lauri has previously worked as an academic researcher in theoretical physics, after obtaining his PhD on topological quantum walks. He still likes to follow the progress of modern physics and is eagerly a waiting the era of quantum machine learning!
Read more about Lauri Lehman

Anindita Basak
Anindita Basak
author image
Anindita Basak

Anindita Basak is a cloud architect with almost 15+ years of experience, the last 12 years of which she has been extensively working on Azure. She has delivered various real-time implementations on Azure data analytics, and cloud-native and real-time event-driven architecture for Fortune 500 enterprises, ranging from banking, financial services, and insurance (BFSI)to retail sectors. She is also a cloud and DataOps trainer and consultant, and author of cloud AI and DevOps books.
Read more about Anindita Basak

View More author details
Right arrow

Data Science Process

Over the past decade, organizations have seen a rapid growth in data. Harnessing insight from that data is crucial to the growth and sustenance of these organizations. Yet, groups chartered with extracting value from data fail for various reasons. In this chapter, we will cover how organizations can avoid the potential pitfalls of data science.

There is a larger discussion about the quality and governance of data, which we will not be covering here. Experienced data scientists recognize the challenges with data and account for them in their processes. In general, some of these challenges include the following:

  • Poor data quality and consistency
  • Silos of data driven by individual business teams
  • Technologies that are hard to integrate with other data sources
  • The inability to deal with the Vs of big data: volume, velocity, variety, and veracity

In some cases...

TDSP stages

The Team Data Science Process (TDSP) is a methodology created by Microsoft to guide the full life cycle of data science projects in organizations. It is not meant to be a complete solution, but simply a framework by which teams can add structure to their processes and achieve the full business value of their analytics.

Besides TDSP, the other prevalent methodology that organizations have been adopting is called CRISP-DM (short for Cross-Industry Standard Process for Data Mining). This methodology has been around since the mid-1990s. There were several attempts to update it in the 2000s, but they were abandoned. The primary focus of CRISP-DM was data mining, but its principles can be extended to data science as well. The major steps listed in CRISP-DM are as follows: business understanding, data understanding, data preparation, modeling, evaluation, and deployment....

Tools for TDSP

Microsoft has released a set of tools that make it easier for organizations to follow the TDSP process. One of those tools is the IDEAR utility released for CRAN-R, Microsoft R, and Python. Another tool is the Automated Modeling and Reporting (AMAR) utility. In this section, we will look into how we can leverage these tools in the TDSP process.

IDEAR tool for R

Summary

In conclusion, we have introduced you to the TDSP in this chapter and covered each of the different steps that are involved in detail. This process is meant to augment other existing processes rather than replace them. We also looked at various TDSP utilities that Microsoft has provided that make it easier to build some structure into the data science life cycle. In the next few chapters, we will look at each of the options available within Azure to build AI solutions for your business needs.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Machine Learning with Azure
Published in: Oct 2018Publisher: PacktISBN-13: 9781789131956
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (5)

author image
Thomas K Abraham

Dr. Thomas K Abraham is a cloud solution architect (advanced analytics and AI) at Microsoft in the South Central Region of the USA. Since January 2016, he's been assisting organizations in leveraging technologies such as SQL, Spark, Hadoop, NoSQL, BI, and AI on Azure. Prior to that, Thomas spent 10 years in Ecolab, where he designed algorithms for IoT devices and built solutions for anomaly detection. In the oil and gas division, he designed and built customer-facing analytics solutions for multiple super majors. His work was focused on preventing equipment failure by modeling corrosion, scale, and other stresses. He has a PhD in Chemical Engineering from The Ohio State University in 2005. His thesis focused on the use of nonlinear optimization with reaction models.
Read more about Thomas K Abraham

author image
Parashar Shah

Parashar Shah is a Senior Program Manager in the Azure Machine Learning platform team.Currently, he works on making Azure Machine Learning services the best place to do e2e machine learning for building custom AI solutions using big data. Previously at Microsoft, he has been a Data Scientist and a Data Solutions Architect in various Cloud and AI teams. Prior to joining Microsoft, Parashar worked at Nokia Networks as a Solutions Architect & Product Manager building customer experience analytics solutions for global telcos. He also co-founded a carpooling startup, which helped employees carpool safely. He has 10+ years of global work experience. He is an alum of Indian Institute of Management, Bangalore and Gujarat University.
Read more about Parashar Shah

author image
Jen Stirrup

Jen Stirrup is a data strategist and technologist, a Microsoft Most Valuable Professional (MVP), and a Microsoft Regional Director, a tech community advocate, a public speaker and blogger, a published author, and a keynote speaker. Jen is the founder of a boutique consultancy based in the UK, Data Relish, which focuses on delivering successful business intelligence and artificial intelligence solutions that add real value to customers worldwide. She has featured on the BBC as a guest expert on topics relating to data.
Read more about Jen Stirrup

author image
Lauri Lehman

Lauri Lehman is a data scientist who is focused on machine learning tools in Azure. He helps customers to design and implement machine learning solutions in the cloud. He works for the software consultancy company, Zure, based in Helsinki, Finland. For the past 4 years, Lauri has specialized in data and machine learning in Azure. He has worked on many machine learning projects, developing solutions for demand estimation, text analytics, and image recognition, for example. Lauri has previously worked as an academic researcher in theoretical physics, after obtaining his PhD on topological quantum walks. He still likes to follow the progress of modern physics and is eagerly a waiting the era of quantum machine learning!
Read more about Lauri Lehman

author image
Anindita Basak

Anindita Basak is a cloud architect with almost 15+ years of experience, the last 12 years of which she has been extensively working on Azure. She has delivered various real-time implementations on Azure data analytics, and cloud-native and real-time event-driven architecture for Fortune 500 enterprises, ranging from banking, financial services, and insurance (BFSI)to retail sectors. She is also a cloud and DataOps trainer and consultant, and author of cloud AI and DevOps books.
Read more about Anindita Basak