After the previous Chapter 2, Let's Help Machine Learn, you now know how to make machines learn from observations and data points so that they can find out interesting patterns, trends, and make predictions. In this chapter, we will be dealing with one of the complex problems faced by retailers, stores, and e-commerce marketplaces today. With the advent of modern technology and innovations, shopping has become a relatively pleasant and enjoyable experience which we can enjoy from the comfort of our home, without even venturing to an actual store, using the web or dedicated apps which provide shopping facilities. With a humongous number of retailers, stores, marketplaces, and sellers, competition is pretty stiff, and to attract customers, they have to use all the data they can gather from consumers about their personal traits and shopping patterns, and use machine learning techniques to try and make shopping experiences...
You're reading from R Machine Learning By Example
In this section, we will talk about what exactly we mean by trends and how the retailers detect and predict these trends. Basically, a trend in the retail context can be defined as a specific pattern or behavior which occurs over a period of time. This may involve a product or a combination of products being sold out in a very short period of time or even the reverse. A simple example would be a best-selling smartphone being prebooked and out of stock before even hitting the shelves on any e-commerce marketplace, or a combination of products like the classic beer and diapers combination which is frequently found in shopping baskets or carts of customers!
How can we even start analyzing shopping carts or start to detect and predict shopping trends. Like I mentioned earlier, we can achieve this with a combination of the right data and algorithms. Let's assume that we are heading a large retail chain. First we will have to keep track of each and every transaction...
Market basket analysis consists of some modeling techniques which are typically used by retailers and e-commerce marketplaces to analyze shopping carts and transactions to find out what customers buy the most, what kind of items they buy, what the peak season is for specific items to be sold the most, and so on. We will be focusing on item based transactional patterns in this chapter for detecting and predicting what items people are buying and are most likely to buy. Let us first look at the formal definition of market basket analysis and then we will look at core concepts, metrics, and techniques tied to it. Finally, we will conclude with how to actually use these results to make data driven decisions.
We will be doing a couple of things here. First, we will analyze a small toy dataset belonging to a supermarket, by using a product contingency matrix of product pair purchases based on their frequency. Then we will move on to contingency matrices based on other metrics such as support, lift, and so on by using another dataset.
The data for our first matrix consists of the six most popular products sold at the supermarket and also the number of times each product was sold by itself and in combination with the other products. We have the data in the form of a data table captured in a csv
file, as you can see in the following figure:
To analyze this data, we first need to understand what it depicts. Basically, each cell value denotes the number of times that product combination was sold. Thus, the cell combination (1, A)
denotes the product combination (milk, milk)
, which is basically the number of times milk was bought. Another example is the cell combination...
We will now look at a better technique to find patterns and detect frequently bought products. For this, we will be using the frequent itemset generation technique. We will be implementing this algorithm from scratch because, even though when we solve any machine learning or optimization problem we usually use readymade machine learning algorithms out of the box which are optimized and available in various R packages, one of the main objectives of this book is to make sure we understand what exactly goes on behind the scenes of a machine learning algorithm. Thus, we will see how we can build some of these algorithms ourselves using the principles of mathematics, statistics, and logic.
We will now be implementing the final technique in market basket analysis for finding out association rules between itemsets to detect and predict product purchase patterns which can be used for product recommendations and suggestions. We will be notably using the Apriori algorithm from the arules
package which uses an implementation for generating frequent itemsets first, which we discussed earlier. Once it has the frequent itemsets, the algorithm generates necessary rules based on parameters such as support, confidence, and lift. We will also show how you can visualize and interact with these rules using the arulesViz
package. The code for this implementation is in the ch3_association
rule mining.R
file which you can directly load and follow the book.
In this chapter, we covered a lot of ground! We started with a discussion about how trends are detected and predicted in the retail vertical. Then we dived into what market basket analysis really means and the core concepts, mathematical formulae underlying the algorithms, and the critical metrics which are used to evaluate the results obtained from the algorithms, notably, support, confidence, and lift. We also discussed the most popular techniques used for analysis, including contingency matrix evaluation, frequent itemset generation, and association rule mining. Next, we talked about how to make data driven decisions using market basket analysis. Finally, we implemented our own algorithms and also used some of the popular libraries in R, such as arules
, to apply these techniques to some real world transactional data for detecting, predicting, and visualizing trends. Do note that these machine learning techniques only talk about product based recommendations purely based on purchase...