Reader small image

You're reading from  10 Machine Learning Blueprints You Should Know for Cybersecurity

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781804619476
Edition1st Edition
Right arrow
Author (1)
Rajvardhan Oak
Rajvardhan Oak
author image
Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak

Right arrow

Detecting Fake Reviews

Reviews are an important element in online marketplaces as they convey the customer experience and their opinions on products. Customers heavily depend upon reviews to determine the quality of a product, the truth about various claims in the description, and the experiences of other fellow customers. However, in recent times, the number of fake reviews has increased. Fake reviews are misleading and fraudulent and cause harm to consumers. They are prevalent not only on shopping sites but also on any site where there is a notion of reputation through reviews, such as Google Maps, Yelp, Tripadvisor, and even the Google Play Store.

Fraudulent reviews harm the integrity of the platform and allow scammers to profit, while genuine users (sellers and customers) are harmed. As data scientists in the security space, understanding reputation manipulation and how it presents itself, as well as techniques for detecting it, is essential. This chapter focuses on examining...

Technical requirements

Reviews and integrity

Let us first look at the importance of online reviews and why fake reviews exist.

Why fake reviews exist

E-commerce websites always have reviews for products. Reviews play an important role in the online world. Reviews allow consumers to post their experiences and facilitate peer-to-peer reputation building. Reviews are important on online platforms for several reasons:

  • Online reviews provide valuable information to potential customers about the quality and performance of a product or service. Customers can read about other people’s experiences with a product or service before deciding whether to buy it or not.
  • Reviews from other customers help build trust between the seller and the buyer. Positive reviews can reassure potential customers that a product or service is worth buying, while negative reviews can warn them about potential problems.
  • Online reviews can provide businesses with valuable feedback about their products and services...

Statistical analysis

In this section, we will try to understand some review data and check whether there are any differences between genuine and fake reviews. We will use the Amazon fake reviews dataset that Amazon has published on Kaggle. It is a set of around 20,000 reviews with associated labels (real or fake) as labeled by domain experts at Amazon.

Exploratory data analysis

We will first load up the data and take a first pass over it to understand the features and their distribution.

We begin by importing the necessary libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We will then read the reviews data. Although it is a text file, it is structured and therefore can be read with the read_csv function in Pandas:

reviews_df = pd.read_csv("amazon_reviews.txt", sep="\t")
reviews_df.head()

This is what the output should look like:

Figure 4.1 – A glimpse of the reviews dataset

Figure 4.1 – A glimpse of the reviews dataset

...

Modeling fake reviews with regression

In this section, we will use the features we examined to attempt to model our data with linear regression.

Ordinary Least Squares regression

Ordinary Least Squares (OLS) linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. The goal of OLS is to find the linear function that best fits the data by minimizing the sum of squared errors between the observed values and the predicted values of the dependent variable.

The linear function is typically expressed as:

Y = β 0+ β 1 X 1+ β 2 X 2+ ... + β n X n+ ε

where Y is the dependent variable, X1, X2, ..., Xn are the independent variables, β0, β1, β2, ..., βn are the coefficients (or parameters) that measure the effect of each independent variable on the dependent variable, and ε is the error term (or residual...

Summary

In this chapter, we examined the problem of fake reviews on e-commerce platforms through the lens of statistical and machine learning models. We began by understanding the review ecosystem and the nature of fake reviews, including their evolution over time. We then explored a dataset of fake reviews and conducted statistical tests to determine whether they show characteristics significantly different from genuine reviews. Finally, we modeled the review integrity using OLS regression and examined how various factors affect the likelihood that a review is fake.

This chapter introduced you to the foundations of data science, including exploratory data analysis, statistics, and the beginnings of machine learning.

In the next chapter, we will discuss techniques for detecting deepfakes, which plague the internet and social media today.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
10 Machine Learning Blueprints You Should Know for Cybersecurity
Published in: May 2023Publisher: PacktISBN-13: 9781804619476
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak