To face a business problem, we need the knowledge and expertise to find its solution. In addition, we also require related data that will help in identifying its solution. This chapter shows how new technologies allow us to build powerful machines that learn from data to give support to business decisions.
The topics that will be covered in this chapter are as follows:
A general idea for approaching business problems
The new challenges relating to digital technologies
How the new tools help in using information
How the tools identify the information that is not evident
How the tools can estimate the outcome of future events
The general idea for approaching business problems hasn't changed over the years, and it combines knowledge and information. Before using digital technologies, knowledge came from expertise provided by previous experiences and by other people. With regards to information, it was about analyzing the current situation and comparing it with past events.
A simple example is that of a fruit monger who wants to set the prices of their goods. The price of a product should maximize the profit, which depends on the sales volume and on the price itself. The dealer started their job working with their father who provided them with all their knowledge. Therefore, they already know the price of the different fruits. In addition, at the end of each day, they can observe the amount of each fruit that has been sold. Based on that, they can raise the price of fruits that sold very well and decrease the price of fruits that they didn't sell. This simple example shows how the fruit monger combines domain knowledge and information to solve their problem, as described in the following figure:
Although the general idea for approaching business problems hasn't changed, digital technologies are providing us with new powerful tools.
The Internet allows people to connect with each other and share their expertise in such a way that everyone has access to a huge set of information. Before the Internet, knowledge came from trusted people and books. Now, the spreading of information has allowed finding books and articles written by different people from every part of the world. In addition, websites and forums allow their users to connect with each other in order to share expertise and find quick answers.
Digital technologies keep track of different activities and produce a lot of related data. We talk about data referring to sets of information—quantitative or qualitative—which is processable by machines. Therefore, when facing a business problem, we can use lots of data from different sources. Some information might not be very relevant, but even after removing it, we often have a huge amount of data. Therefore, we have a lot of improvement potential for the results.
The changes derived from digital technologies involve the process of acquiring expertise and the nature of data. Therefore, the approach to problem solving presents new challenges.
A simple example of a company that faces a business problem is a car dealer who sells different used cars and wants to set the most relevant prices. The car dealer should determine the prices based on the car model, age, and other features. This example is meant to illustrate a possible situation and is not necessarily related to a real problem.
The car dealer needs to identify the best price for each car in order to maximize the revenue. Similar to the fruit monger, if the price of a car is too high, the car dealer won't sell it in a short time, so there will be an extra storage cost and the car will lose value. This leads to an extra cost and a decrease in the profit, thereby damaging the business. On the other hand, if the price is too low, the company will sell the car immediately. Although the storage cost is lower, the company hasn't made the best profit. In order to sell cars and maximize profit, the company wants to figure out the optimal prices.
Let's take a look at the expertise and information that help in finding the solution. The company can use:
The knowledge of agents who have already sold different cars
Information from the Internet
The data about previous sales
The agents can use their past experience, so their knowledge helps in identifying the best prices. However, it's not enough to set the prices when the market changes quickly.
The Internet gives us a lot of information since there are many online shopping websites displaying the prices of used cars. Online shopping is different from the physical market, but an expert agent can take a look at the websites and compare the prices. In this way, the agent can combine their expertise with the online information and identify the right prices in a good way.
This approach leads to good results, but is still not optimal. Looking at different websites is time-consuming, especially if there are many categories of cars, so it is hard or even impossible to check the prices on a daily basis. Another issue is that there might be many websites, making it impossible for a single person to process all the information. By automating our web research and using data more systematically, we can acquire information much faster.
To acquire information, the data sources are the company sales and the online market, and a good solution for car pricing should take into account all these sources. The company sales data shows how the customers reacted to their prices in the past. For instance, we know how long it took to sell each car in the past. If it took too long, the price might have been too high. This criterion is objective and an expert agent can use this information to identify the current wrong prices.
The data derived from online shopping websites displays the car prices, and we can use tools that can store a price and sales history. Although this information is less relevant to the problem, it can be processed similar to the company sales data, thereby improving the result accuracy, as described in the following figure:
This example shows the potential of having more information and expertise. The challenge here is to use information in the most proper way to improve the solution. As a general rule, the more information we use, the more accurate the results can potentially be. In the worst case, we have a lot of irrelevant information and we can identify and use a small relevant part of it.
A single person can solve a business problem by combining data and expertise as long as the data is understandable by the human mind. The growth of data volumes due to digital technologies has changed the way of approaching problems since more data requires new tools in order to be used. In addition, new devices allow us to perform data analysis that would have been impossible on personal computers 10 years ago.
This fact not only changed the way of dealing with data, but also the overall process of making business decisions.
There are several ways to use the information contained in the data. For instance, the Internet movie streaming provider Netflix uses a tool that produces personalized movie recommendations based on your interests. Machine learning refers to the tools that learn from data to provide insights and actions, and it is a subfield of artificial intelligence. Machine learning techniques don't just process data, but rather connect data and the business. This interaction between information and knowledge is crucial and affects almost each step of building solutions.
Knowledge still plays an important role in building the tool that identifies the solution. Since there are many machine learning tools that deal with the same problem, your expertise can be used to choose the most relevant tool. In addition, most of the tools have some parameters, so it's necessary to know the problem to set them up, as described in the following figure:
After the machine learning technique has identified a result, we can validate its performance using information and expertise. For instance, in the car dealer example, we can build a tool that automatically identifies the best prices and predict the necessary time to sell each car. Starting from the previous data, we can use the tool to estimate how long it would have taken to sell cars and compare the estimated time with the real time. In addition, we can identify the current prices and use knowledge and expertise to see if they are reasonable. In this way, we compare how similar the machine learning approach is to reality.
Validation helps in comparing different techniques and choosing the one that performs best. In addition, techniques usually need a setup with different options, and validation helps in choosing the most proper option, as described in the following figure:
In conclusion, the interaction between machine learning and business is extremely important, and it takes place in each part of the process of building the solution.
Data displays some information that is evident and it contains a lot of other information that is more implicit. Sometimes, the solution to a business problem requires some information that is less evident and which may be partly subjective. This section shows how some machine learning techniques discover hidden structures and patterns from the data.
Data that tracks an activity contains the information related to a technology device. For instance, in a supermarket, the checkout machines track the purchases. Therefore, it's possible to have some information about the sales of each item in the past. The available information is the Point of Sale (POS) data and it displays the transactions through the following attributes:
Number of units that have been sold
Price of the item
Date and time of the purchase
The checkout machine's ID
Customer ID (for customers that use a Nectar card)
Some information is manifested and is easily accessible by analyzing the data, whereas some other information is hidden. Starting from the transactions, it's easy to determine the total amount of sales in the past. For instance, we can count how many units of a product have been sold in a day. It is very easy to do so:
Select the transactions based on the product ID and the day.
Add the number of units.
It's still easy to obtain some slightly more elaborated information. We can divide the items into departments, and with the knowledge of the total units that have been sold in each department in the previous year, we can:
Generate a list of product IDs for each department.
For each department, select the transactions of the previous year and of the product IDs of the department.
Add the number of units.
We can use the customer ID in order to track the purchases of each customer. For instance, given a single customer ID, we can determine the total number of units that they purchased. This data is still easy to obtain, so we can't talk about hidden patterns. However, there is still a lot of information about the customers that cannot be directly displayed.
Some customers have similar customer habits. Examples of customer categories are:
Each group of people displays some specific purchase habits that are as follows:
Available money to spend
Products that the customers are interested in
Date and time of the purchase
For instance, students have, on average, less money to spend than other people. Moms are keener to buy groceries and products for the house. Students are more likely to go to the supermarket after school; elderly people will go at almost any time of day.
The data doesn't display which customer IDs are associated with each category of customers, even if it contains some information about their behavior. However, it's hard to identify which customers are similar in order to perform a simple analysis operation. In addition, in order to identify the groups, we need to have an initial guess about the categories of customers.
The options of the marketing campaign determine the following:
Which items are advertised
Which items are discounted
Which weekdays are affected by the promotion
If the supermarket was very small, it would have been possible to extract the data about each customer and consequently address them with a specific campaign. However, the supermarket is big and there are many customers, so it'll be impossible to take into account each one of them separately without the use of some data processing.
A possibility is to define a method that automatically reads the data about each customer and consequently chooses the marketing campaign. This approach requires the following:
Organizing the data and selected information
Modeling the data
Defining the action
This approach works, although it has some drawbacks. The decision about a marketing campaign requires a general picture about the customer base. After having understood the patterns in the customer behavior, it's possible to define a method for the purpose of choosing the marketing campaign starting from the customer behavior. Therefore, this method requires some previous analysis.
Another solution is to identify groups of customers that have similar habits. Once the groups are defined, it's possible to analyze each group separately in order to understand its common purchase behavior.
The following chart shows some customers represented by small circles, where the big circles represent the homogeneous groups of customers:
In this way, the supermarket has some information about each group that helps them identify the right marketing campaign by combining the following:
Some aggregated information about the customers of the group
Some business knowledge that allows them to define a proper marketing campaign
Assuming that each customer will have the same habits in the future, at least in the short term, it's possible to identify the purchase behavior and interests of each group of customers and consequently target them with the same campaign.
Starting from the POS data, we want to model the purchase habits of the supermarket customers in order to identify homogeneous groups. Although the POS data doesn't display the customer behavior directly, it contains the customer ID. The behavior of each customer can be modeled by measuring their habits. For instance, we can measure the total number of units that they have purchased over the last few years. Similarly, we can define some other Key Performance Indicators (KPIs) that are values describing different aspects of the behavior. After extracting all the transactions related to a customer, we can define KPIs as follows:
The total number of units that they purchased in the previous year
The total amount of money that they spent in the last year
The percentage of units that they purchased between 6 p.m. and 7 p.m.
The total money spent in a specific item department
The percentage of money that they spent in summer
Some KPIs that are relevant to the problem are as follows:
The total money spent in the last year, in order to identify the maximum amount of money that a customer can spend
The percentage of money spent in different item departments, in order to identify what the customer is interested in
The percentage of purchases in the morning and in the early afternoon, in order to identify housewives and pensioners
Given a small set of customers, it's easy to identify homogeneous groups by observing the data. However, if we have many customers and/or KPIs, we need computing tools to uncover the hidden patterns in the data.
There are some machine learning algorithms that identify hidden structures, and this branch of techniques is called "unsupervised learning". Starting from the data, the unsupervised learning algorithms identify patterns and labels that are not directly displayed.
In our example, we model the customers using a proper set of KPIs that describe their purchase behavior. Our target is to identify groups that have similar values for the KPIs.
In order to associate the customers, the first step is to measure how similar they are. Observing the data of two customers, we can see that they are similar if the values of their KPIs are similar. Since there are many customers, we can't observe data manually, so we need to define a criterion. The criterion is a function that takes as an input the KPIs of two customers and computes a distance, which is a number that expresses the dissimilarity between the values. In this way, there is an objective way to state how similar two customers are.
We have modeled the customers through objects whose similarity can be measured. There are several machine learning algorithms that group similar objects, and they're called clustering techniques. The techniques group together similar customers and consequently identify homogeneous groups.
There are different options to group the customers, depending on:
The number of desired clusters
The relevance of each KPI
The way to identify clusters
There are different options for clustering, and most of the algorithms contain some parameters. In order to choose the proper technique and setup, we need to explore the data to understand the business problem.
Clustering techniques allow us to identify homogeneous groups of customers. For each cluster, the supermarket has to define a marketing campaign targeting its customers using promotions and discounts.
For each cluster, it's possible to define a summary table showing the average customer's behavior. Combining this information with some business expertise, the supermarket can maximize the positive impact of the campaign.
In conclusion, clustering allows us to convert a massive volume of data into a small set of relevant information. Then, a business expert can read and understand the clustering results to make the best decisions.
This example showed how data and expertise are strongly linked. The machine learning algorithms required the KPIs that are defined using business expertise. After the algorithm has processed the data, business expertise is necessary to identify the right action.
When a business decision consists of choosing between different options, the solution requires estimating the impact of each of them. This chapter shows how machine learning techniques predict future events depending on the options, and how we can measure the accuracy.
If we have to choose between different options, we estimate the impact of each alternative and choose the best. In order to illustrate this, the example is a big supermarket that plans to start selling a new item, and the business decision consists of choosing its price.
In order to choose the best price, the company needs to know:
The price options
The impact of each price option on the item sales
The impact of each price option on the sales of other items
The ideal solution is to maximize the impact of the overall revenue for the short and long term. Regarding the item itself, if its price is too high, the company won't sell it, missing a potential profit. On the other hand, if the price is too low, the company will sell different units without making good revenue.
In addition, the price of a new item will have an impact on the sales of similar items. For instance, if the supermarket is selling a new cereal, the sales of all the other cereal products will be affected. If the new price is too low, some of the customers purchasing the other items will want to save money and consequently purchase the new item. In this way, some customers will spend less money and the overall revenue will decrease. Conversely, if the new item is overpriced, the customers might perceive that the other items are too cheap and consequently that their quality is lower.
There are different effects, and one option is to define a minimum and maximum price of the item as a first step, in order to avoid negative effects on the sales of the related items. Then, we can choose the new price, maximizing the revenue of the item itself.
Let's assume that we have already defined the minimum and maximum price for the new item. The goal is to use the data in order to discover the information that allows us to maximize the revenue of the item. The revenue depends on:
The price of the item
The units that will be sold in the next month
In order to maximize the revenue, we want to estimate it depending on the price, and pick the price that maximizes it. If we can estimate the sales volume depending on the price, we can consequently estimate the revenue.
The data displays information of past transactions, which include:
Number of units that have been sold during the day
Price of the item
There are also other data mapping the item that include some features. In order to simplify the problem, the features are all categorical, so they display categories instead of numbers. Examples of features are as follows:
Other categorical features that define the item
We don't have any data about the sales of the new item, so we need to estimate the customer behavior using data about some similar items. We're assuming that:
The future customers' behavior is similar to the past
The customers' behavior is similar across similar items
The sales of the new item are not affected by the fact that it's new to the market
As we want to estimate the sales volume of the new item, the starting point is the sales volume of similar items. For each item, we extract its transactions in the last month and we compute:
In addition, for each item, we have the data defining its features. The data about each item is the starting point to estimate the revenue.
In our problem, the target of machine learning algorithms is to forecast the sales volume of an item depending on its price. The branch of techniques that learn from the data to forecast a future event is called supervised learning. The starting point of the algorithms is a training set of data that consists of objects whose event is already known. The algorithms identify a relationship between the data that describes the object and the event. Then, they build a model that defines this relationship and use the model for forecasting the event on other objects. The difference between supervised and unsupervised learning is that supervised learning techniques use training with known events whereas unsupervised learning techniques identify patterns that were hidden.
For instance, we have a new item whose price can either be $2, $3, or $4. In order to have the optimal price, we need to estimate the future revenues.
The data displays the sales volume of any item depending on its price and features. The approach for estimating the sales volume of a new item, depending on its price, is to use the sales volume of a defined number (k) of items that are the most similar. For each price, the steps are as follows:
Define which are the k most similar items, given the features and the price of the new item.
Define how to use the data of the similar items to estimate the sales volume of the new item.
In order to identify the most similar items, we have to decide what is similar and how similar it is. In order to do that, we can define a way to measure the similarity between any two items depending on the features and on the price. The similarity can be measured through a distance function, taking into account these features:
Other similar features
An easy way is to measure the distance as the sum of dissimilarities between the features. For instance, a very simple dissimilarity can be the price difference plus the number of categorical features that display different values. A slightly more advantageous way is to give a different weight to each feature on the basis of its relevancy. For instance, two items that do not belong to the same department are very dissimilar, whereas two items that are of the same product but of different brands are very similar.
After defining the distance function, we want to identify the k most similar objects depending on the price. For each price point, we define an item with the features of the new item and the chosen price. Then, for each item in the supermarket, we compute the distance between the item and the new item. In this way, we can pick the k items whose distance is the lowest.
After having identified the k most similar items, we need to determine how to use this information in order to estimate the new volume. A simple method is to compute the average between the sales volumes of the k items. A more advanced approach is to give more importance to more similar items.
The techniques that estimate a future event depending on the past data are called supervised learning techniques. The algorithm that has been illustrated is the k-nearest neighbors (KNN) algorithm, and it's one of the most basic supervised learning techniques.
This chapter showed how business problems are faced by combining expertise and information. You saw how digital technologies led to an increment of the volume of information and provided us with new techniques to face challenges. You had an overview about the two most important branches of machine learning techniques: unsupervised and supervised learning. Unsupervised learning techniques identify some structures that are hidden in the data and supervised learning techniques use the data for estimating an unknown situation.
The next chapter shows the challenges related to machine learning problems and defines the requirements of software that identifies their solution. Then, the chapter introduces the software that we will be using in this book and provides you with a brief tutorial.