Reader small image

You're reading from  Smarter Decisions - The Intersection of Internet of Things and Decision Science

Product typeBook
Published inJul 2016
Reading LevelIntermediate
PublisherPackt
ISBN-139781785884191
Edition1st Edition
Languages
Tools
Concepts
Right arrow
Author (1)
Jojo Moolayil
Jojo Moolayil
author image
Jojo Moolayil

Jojo Moolayil is a data scientist, living in Bengaluru—the silicon valley of India. With over 4 years of industrial experience in Decision Science and IoT, he has worked with industry leaders on high impact and critical projects across multiple verticals. He is currently associated with GE, the pioneer and leader in data science for Industrial IoT. Jojo was born and raised in Pune, India and graduated from University of Pune with a major in information technology engineering. With a vision to solve problems at scale, Jojo found solace in decision science and learnt to solve a variety of problems across multiple industry verticals early in his career. He started his career with Mu Sigma Inc., the world's largest pure play analytics provider where he worked with the leaders of many fortune 50 clients. With the passion to solve increasingly complex problems, Jojo touch based with Internet of Things and found deep interest in the very promising area of consumer and industrial IoT. One of the early enthusiasts to venture into IoT analytics, Jojo converged his learnings from decision science to bring the problem solving frameworks and his learnings from data and decision science to IoT. To cement his foundations in industrial IoT and scale the impact of the problem solving experiments, he joined a fast growing IoT Analytics startup called Flutura based in Bangalore and headquartered in the valley. Flutura focuses exclusively on Industrial IoT and specializes in analytics for M2M data. It is with Flutura, where Jojo reinforced his problem solving skills for M2M and Industrial IoT while working for the world's leading manufacturing giant and lighting solutions providers. His quest for solving problems at scale brought the 'product' dimension in him naturally and soon he also ventured into developing data science products and platforms. After a short stint with Flutura, Jojo moved on to work with the leaders of Industrial IoT, that is, G.E. in Bangalore, where he focused on solving decision science problems for Industrial IoT use cases. As a part of his role in GE, Jojo also focuses on developing data science and decision science products and platforms for Industrial IoT.
Read more about Jojo Moolayil

Right arrow

Chapter 3. The What and Why - Using Exploratory Decision Science for IoT

Problems in any given scenario always keep evolving and so does the solution. The hypotheses that we define while solving the problem will refine with new findings, which will then change the approach partially or completely. Hence, we need to keep our problem solving approach very agile. The problems we solve are often interconnected in nature; a big problem is often composed as a network of multiple smaller problems. These smaller problems can germinate from completely disparate domains, so we would need to accommodate diversity in our approach. Also, the solution can have different approaches based on the problem's scenario. The approach could be top-down, bottom-up, or hybrid; therefore, our solutions need to be flexible. Lastly, the problem can inflate to a mammoth size, thus our solutions need to be scalable.

In this chapter, we will solve the business problem that we defined in Chapter 2, Studying the IoT Problem...

Identifying gold mines in data for decision making


As a first step, before we dig deeper into the data exploration and analysis phase, we need to identify the gold mines in data. In the previous chapter, we designed the heuristic-driven hypotheses (HDH) while defining the problem. We now need to revisit the list and explore it to understand whether we are in a position to solve the problem using the data. We will be able to do this by examining and validating the data sources for the identified hypotheses. In case we do not have data to prove/disprove majority of our important hypotheses, it would not add any value by proceeding any further with the current approach. With data being available, we can get our hands dirty with codes for the solution.

Examining data sources for the hypotheses

If we take a look at the Prioritize and structure hypotheses based on the availability of data section in the previous chapter, we can see that we have listed a couple of hypotheses that could be potential...

Exploring each dimension of the IoT Ecosystem through data (Univariates)


Let's dig deeper into each dimension in the IoT use case to understand more realistically what the data showcases. We will perform extensive univariate analysis to study and visualize the entire data landscape.

What does the data say?

We visited the data dimensions while exploring the gold mines in data (in the previous section) and understood that Product_Qty_Unit, Product_ID, Material_ID, and Product_Name indicate that the columns contain a single value. Therefore, we conclude that the data in the use case is provided for a specific product and its output is measured in Kgs. Let's start exploring Order Quantity and Produced Quantity in depth. We initially studied the data dimensions using summary commands that gave us the percentile distribution. Let's take this one step further.

Order Quantity and Produced Quantity are both continuous variables, that is, a variable that can have infinite number of values possible (say...

Studying relationships


The end result of the produce from the manufacturing plant is whether it can be accepted as a good quality product or discarded due to bad quality. This status for each manufacturing exercise is identified in the data using the 'Detergent_Quality' dimension, which is calculated using some weighted algorithm by taking into account the four output quality parameters of the end detergent produced. Our end goal is to find out the reasons why the final product was not accepted, which shows that we need to study why the output quality was bad. The reasons could be many, but how do we identify them? This is when the task of studying relationships is presented to the decision scientist. We have with us plenty of independent variables that are either continuous or categorical. Trying to understand how these independent dimensions eventually contribute to the end output is where we start studying the relationship between them. The entire exercise can be simply defined as bivariate...

Exploratory data analysis


This part of the problem solving stack is also called "Confirmatory data analysis". Generally, the problems that we touch base over the Internet and other learning resources explain a stack called "ECR" that can be extended as Exploratory Data Analysis + Confirmatory Data Analysis + Root Cause Analysis. This is the same approach that we have considered-Exploratory Data Analysis (EDA)-where we understand "What" happened, then CDA, that is, Confirmatory Data Analysis, where we cement the results from our exercises using statistical tests. Finally, we will answer the "Why" question using Root Cause Analysis. In our current approach, we have the same approach but a slightly different naming convention. We have broken down the steps into more granular ones:

We have now reached the EDA phase, that is, we will now validate the insights and patterns that we observed in the data. Let's start with understanding how we are going to approach this. If we look back at the journey...

Root Cause Analysis


We now begin our journey with answering the why question from all the insights we have gathered till now. Let's assimilate all our results that we have validated in our EDA exercise. Once we have all the results, let's try to simplify it to create a simple story that helps us in answering the questions in a more lucid way.

The following figure is an extended version of the DDH matrix we designed in the previous section along with the results we found during our exercise:

Summary


In this chapter, we moved one step ahead in solving a real-life IoT business use case. Using the blueprint of the problem that we defined in the previous chapter, we attempted solving the problem in a structured way guided by the problem solving framework. After having the business problem well-defined, we got our hands dirty by solving the business problem using R. We started our journey with identifying gold mines in the data for decision making, where we examined the data sources to understand what hypotheses can we prove to solve our problem. We then validated the fact that we have a good amount of data to solve the problem and studied more about the data to understand how the data can be used in our use case. After gathering a fair amount of data and domain context, we explored each dimension in the IoT ecosystem and studied what the data has to say. We performed univariate analysis and also transformed the dimensions to create more powerful and valuable dimensions. We then...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Smarter Decisions - The Intersection of Internet of Things and Decision Science
Published in: Jul 2016Publisher: PacktISBN-13: 9781785884191
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jojo Moolayil

Jojo Moolayil is a data scientist, living in Bengaluru—the silicon valley of India. With over 4 years of industrial experience in Decision Science and IoT, he has worked with industry leaders on high impact and critical projects across multiple verticals. He is currently associated with GE, the pioneer and leader in data science for Industrial IoT. Jojo was born and raised in Pune, India and graduated from University of Pune with a major in information technology engineering. With a vision to solve problems at scale, Jojo found solace in decision science and learnt to solve a variety of problems across multiple industry verticals early in his career. He started his career with Mu Sigma Inc., the world's largest pure play analytics provider where he worked with the leaders of many fortune 50 clients. With the passion to solve increasingly complex problems, Jojo touch based with Internet of Things and found deep interest in the very promising area of consumer and industrial IoT. One of the early enthusiasts to venture into IoT analytics, Jojo converged his learnings from decision science to bring the problem solving frameworks and his learnings from data and decision science to IoT. To cement his foundations in industrial IoT and scale the impact of the problem solving experiments, he joined a fast growing IoT Analytics startup called Flutura based in Bangalore and headquartered in the valley. Flutura focuses exclusively on Industrial IoT and specializes in analytics for M2M data. It is with Flutura, where Jojo reinforced his problem solving skills for M2M and Industrial IoT while working for the world's leading manufacturing giant and lighting solutions providers. His quest for solving problems at scale brought the 'product' dimension in him naturally and soon he also ventured into developing data science products and platforms. After a short stint with Flutura, Jojo moved on to work with the leaders of Industrial IoT, that is, G.E. in Bangalore, where he focused on solving decision science problems for Industrial IoT use cases. As a part of his role in GE, Jojo also focuses on developing data science and decision science products and platforms for Industrial IoT.
Read more about Jojo Moolayil

Hypothesis

Result

Insight

Line 1 has an overall higher chance of manufacturing more number of bad quality detergent products

FALSE

Assembly Line has no impact on the end quality of the detergent

Line 1 has an overall higher chance of deteriorating the Output Quality Parameters in the detergent

TRUE

Assembly line has an impact on Output Quality Parameter 2,3, and 4

As the deviation between Order Quantity and actual Produced Quantity increases, the chance of the bad quality detergent being...