Reader small image

You're reading from  Hands-On Machine Learning with Microsoft Excel 2019

Product typeBook
Published inApr 2019
PublisherPackt
ISBN-139781789345377
Edition1st Edition
Tools
Right arrow
Author (1)
Julio Cesar Rodriguez Martino
Julio Cesar Rodriguez Martino
author image
Julio Cesar Rodriguez Martino

Julio Cesar Rodriguez Martino is a machine learning (ML) and artificial intelligence (AI) platform architect, focusing on applying the latest techniques and models in these fields to optimize, automate, and improve the work of tax and accounting consultants. The main tool used in this practice is the MS Office platform, which Azure services complement perfectly by adding intelligence to the different tasks. Julio's background is in experimental physics, where he learned and applied advanced statistical and data analysis methods. He also teaches university courses and provides in-company training on machine learning and analytics, and has a lot of experience leading data science teams.
Read more about Julio Cesar Rodriguez Martino

Right arrow

Data Mining Models in Excel Hands-On Examples

Data mining is about finding hidden patterns and associations in data. A large number of analyses that can only be performed by a human in a reasonable time if the amount of data is small, can be done by a computer in a very short time. Before Excel 2016, it was possible to install an add-in (called Data Mining) that was packed with different methods and models that could be used, mostly as black boxes, to get insights and discover information in any dataset. Microsoft then changed its policy and started migrating this capability to the cloud, first to Office 365 and, most recently, to Azure. It is also possible to calculate many things in Excel, with built-in functions and even with the help of macros programmed in Visual Basic Applications (VBA). Here, we will show two cases where data mining can help us find important information...

Technical requirements

Learning by example – Market Basket Analysis

We have all read the sentence in almost every online store: People who bought this product also bought.... It all started with Amazon in the 1990s, and it is widespread today. This same principle is even being tested in physical stores, where customers can get personalized suggestions on which items to buy based on their shopping history and similarity with other products. These techniques are generally known as cross-selling, and they are useful since it is often easier to sell an additional product to an existing customer than to acquire a new one.

In this section, we will show a simplified example of Market Basket Analysis, which is the type of data mining technique behind these recommender systems. The results might not be as accurate as those obtained with more advanced methods, but are still useful to explain the method...

Learning by example – Customer Cohort Analysis

An excellent way of gaining insights about a company's customers and their behavior is to perform a segmented analysis. These segments are groups of customers that share the same characteristics and are usually called cohorts. Their definition depends very much on the type of business we are dealing with.

The dataset we will analyze contains a list of customer IDs, the date when they first purchased something from us, the day they left us, and the mean amount of money spent per month. What does "left us" mean in this context? It is mainly a matter of definition. For example, if we refer to a credit card, we could consider that the customer left when they cancelled the card, or maybe we want to go further and say that if the customer did not spend any money for two or three months, we consider that they have left...

Summary

In this chapter, we learned about two data mining techniques: Market Basket Analysis and Customer Cohort Analysis. The first one tells us about hidden relations between store departments or products based on the customers' behavior. The second shows the time evolution of the number of customers, revealing differences between different customer segments or cohorts.

There are a large number of data mining analyses that can help us dig into our datasets and find hidden information, which is extremely important in order for us to understand our business and make the right decisions.

In the next chapter, we will study a special kind of dataset in detail: time series. We will see that it needs a different kind of modeling and analysis.

Questions

  1. How can the unexplainable, random associations be avoided in a Market Basket Analysis?
  2. If we find a correlation between two products or two departments in a given store, can this be generalized to other places?
  3. Some columns in the cohort matrix show zeroes starting at a given date. What does this mean in the context of the current dataset?
  4. The last rows in the cohort matrix show zero consistently. What does this mean in the context of the current dataset?
  5. How would you maximize the amount of money spent by customers as time passes?

Further reading

  • Database Marketing: Analyzing and Managing Customers by Robert C. Blattberg, Byung-Do Kim, Scott A Neslin
  • Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking, 1st Edition, by Foster Provost, Tom Fawcett
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Machine Learning with Microsoft Excel 2019
Published in: Apr 2019Publisher: PacktISBN-13: 9781789345377
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Julio Cesar Rodriguez Martino

Julio Cesar Rodriguez Martino is a machine learning (ML) and artificial intelligence (AI) platform architect, focusing on applying the latest techniques and models in these fields to optimize, automate, and improve the work of tax and accounting consultants. The main tool used in this practice is the MS Office platform, which Azure services complement perfectly by adding intelligence to the different tasks. Julio's background is in experimental physics, where he learned and applied advanced statistical and data analysis methods. He also teaches university courses and provides in-company training on machine learning and analytics, and has a lot of experience leading data science teams.
Read more about Julio Cesar Rodriguez Martino