Packt+ | Advance your knowledge in tech

You're reading from Practical Predictive Analytics

Product typeBook

Published inJun 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781785886188

Edition1st Edition

Languages

Tools

Splunk

Concepts

Predictive Analytics

Author (1)

Ralph Winters

Chapter 7. Using Market Basket Analysis as a Recommender Engine

"It's not wise to violate the rules until you know how to observe them."

- T.S. Eliot

In this chapter, we will cover the following topics:

Market basket analysis using the arules package
Data transformation and cleaning techniques using semi-structured market basket transaction data
Learn how to transform transaction objects into dataframes
Use cluster analysis for prediction using the flexclus package
Utilize some text mining using RTextTools and tm packages

What is market basket analysis?

If you have survived the last chapter, you will now be introduced to the world of market basket analysis (MBA). Market basket analysis (also sometimes called affinity analysis), is a predictive analytics technique that is used heavily in the retail industry in order to identify baskets of items that are purchased together. The typical use case for this is the supermarket shopping cart in which a shopper would typically purchase an assortment of items such as milk, bread, cheese, and so on, and the algorithm will predict how purchasing certain items together will affect the purchase of other items. It is one of those methods that retailers use to know to start sending you coupons and emails for things that you didn't know you needed!

One often quoted example of MBA is the relationship between diapers and beer:

"One super market chain discovered in its analysis that customers that bought diapers often bought beer as well, have put the diapers close to beer coolers...

Examining the groceries transaction file

Critical to the understanding of MBA are the concepts of support, confidence, and lift. These are the measures that evaluated the goodness of fit for a set of association rules. You will also learn some specific definitions that are used in MBA, such as consequence, antecedent, and itemsets.

To introduce these concepts, we will first illustrate these terms through a very simplistic example. We will use only the first 10 transactions contained in the Groceries transaction file, which is contained in the arules package:

library(arules)

After the arules library is loaded, you can see a short description of the Groceries dataset by entering ?Groceries at the command line. The following description appears in the help window:

"The Groceries data set contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 169 categories".

For more information...

The sample market basket

Each transaction numbered 1-10 listed previously represents a basket of items purchased by a shopper. These are typically all items that are associated with a particular transaction or invoice. Each basket is enclosed within braces {}, and is referred to as an itemset. An itemset is a group of items that occur together.

Market basket algorithms construct rules in the form of:

Itemset{x1,x2,x3 ...} --> Itemset{y1,y2,y3...}.

This notation states that buyers who have purchased items on the left-hand side of the formula (lhs) have a propensity to purchase items on the right-hand side (rhs). The association is stated using the Ã symbol, which can be interpreted as implies.

Note

The lhs of the notation is also known as the antecedent, and the rhs is known as the consequence. If nothing appears on either the left-hand side or right-hand side there is no specific association rule for those items; however, it also means that those items have appeared in the basket.

Association rule algorithms

Without an association rule algorithm, you are left with the computationally very expensive task of generating all possible pairs of itemsets, and then trying to mine the data in order to identify the best ones yourself. Associate rule algorithms help with filtering this.

The most popular algorithm for MBA is the apriori algorithm, which is contained within the arules package (the other popular algorithm is the eclat algorithm).

Running apriori is fairly simple. We will demonstrate this using our demo 10 transaction itemset that we just printed.

The apriori algorithm is based upon the principle that if a particular itemset is frequent, then all of its subsets must also be frequent. That principle itself is helpful for reducing the number of itemsets that need to be evaluated, since it only needs to look at the largest items sets first, and then be able to filter down:

First, some housekeeping. Fix the number of printable digits to 2:

         options(digits = 2)

Next...

Antecedents and descendants

The rules shown previously are expressed as an implication between the antecedent (left-hand side) and the consequence (right-hand side).

The first rule, describes customers who buy a bottle of water also buying tropical fruit. The third rule states that customers who buy cereals have a tendency to buy whole milk.

Evaluating the accuracy of a rule

Three main metrics have been developed that measure the importance, or accuracy of an association rule: support, confidence, and lift.

Support

Support measures how frequently the items occur together. Imagine having a shopping cart in which there can be a very large number of combinations of items. Some items that occur rarely could be excluded from the analysis. When an item occurs frequently you will have more confidence in the association among the items, since it will be a more popular item. Often your analysis will be centered around items with high support.

Calculating support

Calculating support is simple. You first calculate a proportion by counting the number of times that the items in the rule appear in the basket divided by the total number of occurences in the itemsets:

Examples

We can see that for the first rule (index #63), {bottled water} and {tropical fruit} appear together in the same transaction in two different transactions (2 and 3), therefore...

Preparing the raw data file for analysis

Now that we have had a short introduction to the association rules algorithm, we will illustrate applying association rules to a more meaningful example.

We will be using the online retail dataset, which can be obtained from the UCI machine learning repository at:

https://archive.ics.uci.edu/ml/datasets/Online+Retail.

As described by the source, the data is:

"A transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers".

For more information about how the dataset was created, please refer to the original journal article (Daqing Chen, 2012).

Reading the transaction file

We will input the Groceries data using the read.csv() function.

We can use the file.show() function to directly examine the input file if needed. This is sometimes needed if you find that there are...

Analyzing the input file

After reading in the file, the nrow() function shows that the transaction file contains 541909 rows:

nrow(OnlineRetail)

This is the following output:

> [1] 541909

We can use our handy View() function to peruse the contents. Alternatively, you can use the kable() function from the knitr library to display a simple tabular display of the dataframe in the console, as indicated later.

Look at the first few records. The kable() function will attempt to fit a simple table in the space providing, and will also truncate any long strings:

kable(head(OnlineRetail))

We can still see the last column is truncated (United Kingdom), but all of the columns fit without wrapping to the next line:

Note

Using an R Notebook with the kable() function. Note that when using the Rmarkdown package, or an R Notebook in RStudio, the output from the kable() function can be formatted to appear as an HTML table in the markdown file. Otherwise, it will appear as plain ASCII text. For example, you may...

Scrubbing and cleaning the data

Here comes the cleaning part!

Print some of the groceries contained within the description field of OnlineRetail:

kable(OnlineRetail$Description[1:5],col.names=c("Grocery Item Descriptions")) 
|Grocery Item Descriptions                 |  
|:-----------------------------------------| 
|WHITE HANGING HEART T-LIGHT HOLDER        | 
|METAL METAL LANTERN                       | 
|CREAM CUPID HEARTS COAT HANGER            | 
|KNITTED UNION FLAG HOT WATER BOTTLE       | 
|RED WOOLLY HOTTIE WHITE HEART.            |

Although each line contains a separate grocery item, the items are in a uniform format, that is, the number of words describing each item can vary, and some words are adjectives and some are nouns. Additionally, the retailer may deem certain words to be irrelevant to a particular marketing campaign (such as colors, or sizes, which may be standard across all products). This type of data can be referred to as semi-structured data, since it incorporates certain...

Removing colors automatically

If you did not want to bother specifying colors, and you wanted to remove colors automatically, you could accomplish that as well.

The colors() function

The colors() function returns a list of colors that are used in the current palette. We can then perform a little code manipulation in conjunction with the gsub() function that we just used to replace all of the specified colors from OnlineRetail$Description with blanks.

We will also use the kable() function, which is contained within the knitr package, in order to produce simple HTML tables of the results:

# compute the length of the field before changes
 before <- sum(nchar(OnlineRetail$Description))

 # get the unique colors returned from the colors function, and remove any digits found at the end of the string

 # get the unique colors
 col2 <- unique(gsub("[0-9]+", "", colors(TRUE)))


 #Now we will filter out any colors with a length > 7. This number is somewhat arbitrary but it is just done for illustration...

Filtering out single item transactions

Since we will want to have a basket of items to perform some association rules on, we will want to filter out the transactions that only have one item per invoice. That might be useful for a separate analysis of customers who only purchased one item, but it does not help with finding associations between multiple items, which is the goal of this exercise.

Let's use sqldf to find all of the single item transactions, and then we will create a separate dataframe consisting of the number of items per customer invoice:

        library(sqldf)

First construct a query: How many distinct invoices were there? We see that there were 25900 separate invoices:

        sqldf("select count(distinct InvoiceNo) from   
        OnlineRetail") 
        > Loading required package: tcltk 
        >   count(distinct InvoiceNo)
        > 1                     25900

How many invoices contain only single transactions? First, extract the single item invoices:

        single...

Merging the results back into the original data

We will want to retain the number of total items for each invoice on the original data frame. That will involve joining the number of items contained in each invoice back to the original transactions, using the merge() function, and specifying Invoicenum as the key.

If you count the number of distinct invoices before and after the merge, you can see that the invoice count is lower than prior to the merge:

#first take a 'before' snapshot 
 
nrow(OnlineRetail) 
> [1] 541909 
 
#count the number of distinct invoices 
 
sqldf("select count(distinct InvoiceNo) from OnlineRetail")

The output shows a total of 25900 distinct invoices:

>   count(distinct InvoiceNo) 
> 1                     25900

Now merge the counts back with the original data:

OnlineRetail <- merge(OnlineRetail, x2, by = "InvoiceNo")

Check the new number of rows, and the new count of distinct invoices (20059 versus 25900). Note these counts compared to the original. The reduction...

Compressing descriptions using camelcase

For long descriptions, sometimes it is beneficial to compress them into camelcase to improve readability. This is especially valuable when viewing descriptions that are labels on x or y axes.

Camelcase is a method that some programmers use for writing compound words, where spaces are first removed, and then each word begins with a capital letter. It is also a way of conserving space.

To accomplish this, we can write a function called .simpleCap, which performs this function. To illustrate how it works, we will pass it a two element character vector c("A certain good book","A very easy book"), and observe the results.

Custom function to map to camelcase

This is a simple example use of this function that maps the two character vector c("A certain good book", "A very easy book") to camelcase. This vector is mapped to two new elements:

[1] "ACertainGoodBook", and  [2] "AVeryEasyBook" 
 
# change descriptions to camelcase maybe append to itemnumber for uniqueness...

Creating the test and training datasets

Now that we are finished with our transformations, we will create the training and test data frames. We will perform a 50/50 split between training and test:

# Take a sample of full vector
nrow(OnlineRetail) 
> [1] 536068 
pctx <- round(0.5 * nrow(OnlineRetail))
set.seed(1)

# randomize rows

df <- OnlineRetail[sample(nrow(OnlineRetail)), ]
rows <- nrow(df)
OnlineRetail <- df[1:pctx, ]  #training set
OnlineRetail.test <- df[(pctx + 1):rows, ]  #test set
rm(df)

# Display the number of rows in the training and test datasets.

nrow(OnlineRetail) 
> [1] 268034 
nrow(OnlineRetail.test) 
> [1] 268034

Saving the results

It is a good idea to periodically save your data frames, so that you can pick up your analysis from various checkpoints.

In this example, I will first sort them both by InvoiceNo, and then save the test and train data sets to disk, where I can always load them back into memory as needed:

setwd("C:/PracticalPredictiveAnalytics...

Creating the market basket transaction file

We are almost there! There is an extra step that we need to do in order to prepare our data for market basket analysis.

The association rules package requires that the data be in transaction format. Transactions can either be specified in two different formats:

One transaction per itemset with an identifier and this shows the entire basket in one line, just as we saw with the Groceries data.
One single item per line with an identifier.

Additionally, you can create the actual transaction file in two different ways, by either:

Physically writing a transactions file.
Coercing a dataframe to transaction format.

For smaller amounts of data, coercing the dataframe to a transaction file is simpler, but for large transaction files, writing the transaction file first is preferable, since append files can be fed from large operational transaction systems. We will illustrate both ways.

Method one Coercing a dataframe to a transaction file

Now we are ready to coerce...

Method two Creating a physical transactions file

Now that you know how to run association rules using the coerce to dataframe method, we will now illustrate the write to file method:

In the write to file method, each item is written to a separate line, along with the identifying key, which in our case is the InvoiceId
The advantage to the write to file method is that very large data files can be accumulated separately, and then combined together if needed
You can use the file.show function to display the contents of the file that will be input to the association rules algorithm:

setwd("C:/PracticalPredictiveAnalytics/Data")
load("OnlineRetail.full.Rda")
OnlineRetail <- OnlineRetail[1:100,]
nrow(OnlineRetail)
> [1] 268034 
head(OnlineRetail) 
> InvoiceNo StockCode  Description                Quantity
 > 5   6365     71053  METAL LANTERN                     6
 > 6   536365   21730  GLASS STAR FROSTED T-LIGHT HOLDER 6
 > 2   536365   22752  SET 7 BABUSHKA NESTING BOXES      2...

Converting to a document term matrix

Once we have a corpus, we can proceed to convert it to a document term matrix. When building DTM, care must be given to limiting the amount of data and resulting terms that are processed. If not parameterized correctly, it can take a very long time to run. Parameterization is accomplished via the options. We will remove any stopwords, punctuation, and numbers. Additionally, we will only include a minimum word length of four:

library(tm)
 dtm <- DocumentTermMatrix(corp, control = list(removePunctuation = TRUE, wordLengths = c(4, 
 999), stopwords = TRUE, removeNumbers = TRUE, stemming = FALSE, bounds = list(global = c(5, 
 Inf))))

We can begin to look at the data by using the inspect() function.

This is different from the inspect() function in an arules package, and if you have the arules package loaded, you will want to preface this inspect with tm::inspect:

inspect(dtm[1:10, 1:10]) > <<DocumentTermMatrix (documents: 10, terms: 10)>>
 >...

K-means clustering of terms

Now we can cluster the term document matrix using k-means. For illustration purposes, we will specify that five clusters be generated:

kmeans5 <- kmeans(dtms, 5)

Once k-means is done, we will append the cluster number to the original data, and then create five subsets based upon the cluster:

kw_with_cluster <- as.data.frame(cbind(OnlineRetail, Cluster = kmeans5$cluster))

 # subset the five clusters
 cluster1 <- subset(kw_with_cluster, subset = Cluster == 1)
 cluster2 <- subset(kw_with_cluster, subset = Cluster == 2)
 cluster3 <- subset(kw_with_cluster, subset = Cluster == 3)
 cluster4 <- subset(kw_with_cluster, subset = Cluster == 4)
 cluster5 <- subset(kw_with_cluster, subset = Cluster == 5)

Examining cluster 1

Print out a sample of the data:

> head(cluster1[10:13])
 Desc2 lastword firstword Cluster
 50 VintageBillboardLove/hateMug MUG VINTAGE 1
 86 BagVintagePaisley PAISLEY BAG 1
113 ShopperVintagePaisley PAISLEY SHOPPER 1
145 ShopperVintagePaisley...

Predicting cluster assignments

The goal in this exercise is to score the test dataset, by assigning clusters based upon the predict method for the training dataset.

Using flexclust to predict cluster assignment

The standard kmeans function does not have a prediction method. However, we can use the flexclust package which does. Since the prediction method can take a long time to run, we will illustrate it only on a sample number of rows and columns. In order to compare the test and training results, they also need to have the same number of columns. For illustration purposes, we will set the number at 10.

To begin, take a sample from the OnlineRetail training data:

set.seed(1)
 sample.size <- 10000
 max.cols <- 10

library("flexclust") OnlineRetail <- OnlineRetail[1:sample.size, ]

Next, create the document term matrix from the description column in the sampled dataset. We will use the create_matrix function from the RTextTools package, which can create a TDM first without having a separate...

Running the apriori algorithm on the clusters

Circling back to the apriori algorithm, we can use the predicted clusters that were generated instead of lastword, in order to develop some rules:

We will use the coerce to dataframe method to generate the transaction file as previously generated
Create a rules_clust object, which builds association rules based upon the itemset of clusters {1,2,3,4,5}
Inspect some of the generated rules by lift:

        library(arules)
        colnames(kw_with_cluster2_score)   
        kable(head(kw_with_cluster2_score[,c(1,13)],5))
        tmp <-    
        data.frame(kw_with_cluster2_score[,1],
        kw_with_cluster2_score[,13])
        names(tmp) [1] <- "TransactionID" 
        names(tmp) [2] <- "Items"
        tmp <- unique(tmp)
        trans4 <- as(split(tmp[,2], tmp[,1]), "transactions")   rules_clust
        <- apriori(trans4,parameter =    list(minlen=2,support =
        0.02,confidence = 0.01))    summary(rules_clust)
        tmp <...

Summarizing the metrics

Running a summary on the rules_clust object indicates an average support of 0.05, and average confidence of 0.43.

This demonstrates that using clustering can be a viable way to develop association rules, and reduce resources and the number of dimensions at the same time:

    support          confidence           lift     
 Min.   :0.02044   Min.   :0.09985   Min.   :0.989 
 1st Qu.:0.02664   1st Qu.:0.19816   1st Qu.:1.006 
 Median :0.03066   Median :0.27143   Median :1.526 
 Mean   :0.05040   Mean   :0.43040   Mean   :1.608 
 3rd Qu.:0.04234   3rd Qu.:0.81954   3rd Qu.:1.891 
 Max.   :0.17080   Max.   :1.00000   Max.   :3.022

References

Daqing Chen, S. L. (2012). Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3.
Michael Hahsler, K. H. (2006). Implications of probabilistic data modeling for mining association rules. In R. K. M. Spiliopoulou, Data and Information Analysis to Knowledge.
Engineering, Studies in Classification, Data Analysis, and Knowledge Organization (pp. 598-605). Springer-Verlag.

Summary

In this chapter, we learned about a specific type of recommender engine, under the umbrella term market basket analysis.

We saw that market basket analysis enabled you to mine large quantities of transactions containing semi-structured data to derive association rules among the itemsets contained in each basket.

Some additional data cleaning techniques were used on the market basket data, in order to standardize and consolidate some of the descriptions of the purchased items. We also learned how to isolate the most powerful rules, using plotting techniques, along with metrics such as lift, support, and confidence.

Finally, we showed you how to generate clusters from your market basket data training data, and to predict cluster assignments based upon a test data set.

The rest of the chapter is locked

You have been reading a chapter from

Practical Predictive Analytics

Published in: Jun 2017Publisher: PacktISBN-13: 9781785886188

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Ralph Winters

Ralph Winters started his career as a database researcher for a music performing rights organization (he composed as well!), and then branched out into healthcare survey research, finally landing in the Analytics and Information technology world. He has provided his statistical and analytics expertise to many large fortune 500 companies in the financial, direct marketing, insurance, healthcare, and pharmaceutical industries. He has worked on many diverse types of predictive analytics projects involving customerretention, anti-money laundering, voice of the customer text mining analytics, and health care risk and customer choice models. He is currently data architect for a healthcare services company working in the data and advanced analytics group. He enjoys working collaboratively with a smart team of business analysts, technologists, actuaries as well as with other data scientists. Ralph considered himself a practical person. In addition to authoring Practical Predictive Analytics for Packt Publishing, he has also contributed two tutorials illustrating the use of predictive analytics in Medicine and Healthcare in Practical Predictive Analytics and Decisioning Systems for Medicine: Miner et al., Elsevier September, 2014, and also presented Practical Text Mining with SQL using Relational Databases, at the 2013 11th Annual Text and Social Analytics Summit in Cambridge, MA. Ralph resides in New Jersey with his loving wife Katherine, amazing daughters Claire and Anna, and his four-legged friends, Bubba and Phoebe, who can be unpredictable. Ralph's web site can be found at ralphwinters.com
Read more about Ralph Winters

Other recommended products

Related to this chapter

Big Data Analytics with Hadoop 3

Apache Hadoop is the most popular platform for big data processing to build powerful analytics solutions. This book shows you how to do just that, with the help of practical examples. You will be well-versed with the analytical capabilities of Hadoop ecosystem with Apache Spark and Apache Flink to perform big data analytics by the end of this book.

BookMay 2018482 pages

Hands-On Exploratory Data Analysis with R

Hands-On Exploratory Data Analysis with R puts the complete process of exploratory data analysis into a practical demonstration in one nutshell. You will understand the concepts of data analysis right from data ingestion, data cleaning, data manipulation to applying statistical techniques and visualizing hidden patterns.

BookMay 2019266 pages

Machine Learning with R Cookbook

The R language is a powerful open source functional programming language. At its core, R is a statistical language that provides impressive tools to analyze data and create high-level graphics. This book covers the basics of R by setting up a user-friendly programming environment and programming ETL in R. Data exploration examples are provided that demonstrate how powerful data visualisation and machine learning is in discovering hidden relationships. You will also explore air quality data, steps to fix the missing values and visualising the same. You will then dive into important machine learning topics, including data classification, regression, survival analysis, time series analysis, clustering association rule mining, and dimension reduction.This book will include the latest code and examples based on R 3.3 and above—updated for better computation, accuracy, and speed with R.

BookOct 2017572 pages

Hands-On Ensemble Learning with R

This book introduces you to the concept of ensemble learning and demonstrates how different machine learning algorithms can be combined to build efficient machine learning models. Use R to implement the popular trilogy of ensemble techniques, i.e. bagging, random forest and boosting, to build faster and more accurate machine learning models.

BookJul 2018376 pages

Practical Machine Learning with R

Practical Machine Learning with R gives you the complete knowledge to solve your business problems - starting by forming a good problem statement, selecting the most appropriate model to solve your problem, and then ensuring that you do not overtrain the model.

BookAug 2019416 pages

Associations and Correlations

Through this book, you’ll learn why most statistical techniques give incorrect results and what you can do to avoid the most common pitfalls. You’ll learn how to make sure you get the correct results the first time, every time.

BookJun 2019134 pages

R Data Analysis Projects

R offers a large variety of packages and libraries for fast and accurate data analysis and visualization. As a result, it is one of the most popularly used languages by data scientists and analysts, or anyone who wants to perform data analysis. In this book, we show you just how to do that - with the help of practical implementations of real-world use cases.

BookNov 2017366 pages

Regression Analysis with R

Regression analysis is a statistical process which enables prediction of relationships between variables. This book will give you a rundown explaining what regression analysis is, explaining you the process from scratch. Each chapter starts with explaining the theoretical concepts and once the reader gets comfortable with the theory, we move to the practical examples to support the understanding. By the end of this book you will know all the concepts and pain-points related to regression analysis, and you will be able to implement your learning in your projects.

BookJan 2018422 pages

SAS for Finance

SAS is the ground-breaking tool for advanced, predictive, and statistical analytics. Right from refining your data using power of SAS analytics, you will be able to exploit the capabilities of high-powered package to create accurate financial models. You can easily assess the pros and cons of models to suit unique business needs.

BookMay 2018306 pages

IBM SPSS Modeler Essentials

IBM SPSS Modeler allows quick, efficient predictive analytics and insight building from your data, and is a popularly used data mining tool. This book will guide you through the data mining process, and presents relevant statistical methods which are used to build predictive models and conduct other analytic tasks using IBM SPSS Modeler. From importing the data to finding hidden relationships within it, you will be able to build solid data mining solutions and then deploy them to production. The book also contains valuable information on evaluating and enhancing the performance of your data models.

BookDec 2017238 pages

Data Science with SQL Server Quick Start Guide

SQL Server started to fully support data science only with its last two editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning Services for their projects, then this is the ideal book for you.

BookAug 2018206 pages

Applied Supervised Learning with R

Applied Supervised Learning with R will make you a pro at identifying your business problem, selecting the best supervised machine learning algorithm to solve it, and fine-tuning your model to exactly deliver your needs without overfitting itself.

BookMay 2019502 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages