Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
10 Machine Learning Blueprints You Should Know for Cybersecurity

You're reading from  10 Machine Learning Blueprints You Should Know for Cybersecurity

Product type Book
Published in May 2023
Publisher Packt
ISBN-13 9781804619476
Pages 330 pages
Edition 1st Edition
Languages
Author (1):
Rajvardhan Oak Rajvardhan Oak
Profile icon Rajvardhan Oak

Table of Contents (15) Chapters

Preface 1. Chapter 1: On Cybersecurity and Machine Learning 2. Chapter 2: Detecting Suspicious Activity 3. Chapter 3: Malware Detection Using Transformers and BERT 4. Chapter 4: Detecting Fake Reviews 5. Chapter 5: Detecting Deepfakes 6. Chapter 6: Detecting Machine-Generated Text 7. Chapter 7: Attributing Authorship and How to Evade It 8. Chapter 8: Detecting Fake News with Graph Neural Networks 9. Chapter 9: Attacking Models with Adversarial Machine Learning 10. Chapter 10: Protecting User Privacy with Differential Privacy 11. Chapter 11: Protecting User Privacy with Federated Machine Learning 12. Chapter 12: Breaking into the Sec-ML Industry 13. Index 14. Other Books You May Enjoy

Detecting Fake Reviews

Reviews are an important element in online marketplaces as they convey the customer experience and their opinions on products. Customers heavily depend upon reviews to determine the quality of a product, the truth about various claims in the description, and the experiences of other fellow customers. However, in recent times, the number of fake reviews has increased. Fake reviews are misleading and fraudulent and cause harm to consumers. They are prevalent not only on shopping sites but also on any site where there is a notion of reputation through reviews, such as Google Maps, Yelp, Tripadvisor, and even the Google Play Store.

Fraudulent reviews harm the integrity of the platform and allow scammers to profit, while genuine users (sellers and customers) are harmed. As data scientists in the security space, understanding reputation manipulation and how it presents itself, as well as techniques for detecting it, is essential. This chapter focuses on examining...

Technical requirements

Reviews and integrity

Let us first look at the importance of online reviews and why fake reviews exist.

Why fake reviews exist

E-commerce websites always have reviews for products. Reviews play an important role in the online world. Reviews allow consumers to post their experiences and facilitate peer-to-peer reputation building. Reviews are important on online platforms for several reasons:

  • Online reviews provide valuable information to potential customers about the quality and performance of a product or service. Customers can read about other people’s experiences with a product or service before deciding whether to buy it or not.
  • Reviews from other customers help build trust between the seller and the buyer. Positive reviews can reassure potential customers that a product or service is worth buying, while negative reviews can warn them about potential problems.
  • Online reviews can provide businesses with valuable feedback about their products and services...

Statistical analysis

In this section, we will try to understand some review data and check whether there are any differences between genuine and fake reviews. We will use the Amazon fake reviews dataset that Amazon has published on Kaggle. It is a set of around 20,000 reviews with associated labels (real or fake) as labeled by domain experts at Amazon.

Exploratory data analysis

We will first load up the data and take a first pass over it to understand the features and their distribution.

We begin by importing the necessary libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We will then read the reviews data. Although it is a text file, it is structured and therefore can be read with the read_csv function in Pandas:

reviews_df = pd.read_csv("amazon_reviews.txt", sep="\t")
reviews_df.head()

This is what the output should look like:

Figure 4.1 – A glimpse of the reviews dataset

Figure 4.1 – A glimpse of the reviews dataset

...

Modeling fake reviews with regression

In this section, we will use the features we examined to attempt to model our data with linear regression.

Ordinary Least Squares regression

Ordinary Least Squares (OLS) linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. The goal of OLS is to find the linear function that best fits the data by minimizing the sum of squared errors between the observed values and the predicted values of the dependent variable.

The linear function is typically expressed as:

Y = β 0+ β 1 X 1+ β 2 X 2+ ... + β n X n+ ε

where Y is the dependent variable, X1, X2, ..., Xn are the independent variables, β0, β1, β2, ..., βn are the coefficients (or parameters) that measure the effect of each independent variable on the dependent variable, and ε is the error term (or residual...

Summary

In this chapter, we examined the problem of fake reviews on e-commerce platforms through the lens of statistical and machine learning models. We began by understanding the review ecosystem and the nature of fake reviews, including their evolution over time. We then explored a dataset of fake reviews and conducted statistical tests to determine whether they show characteristics significantly different from genuine reviews. Finally, we modeled the review integrity using OLS regression and examined how various factors affect the likelihood that a review is fake.

This chapter introduced you to the foundations of data science, including exploratory data analysis, statistics, and the beginnings of machine learning.

In the next chapter, we will discuss techniques for detecting deepfakes, which plague the internet and social media today.

lock icon The rest of the chapter is locked
You have been reading a chapter from
10 Machine Learning Blueprints You Should Know for Cybersecurity
Published in: May 2023 Publisher: Packt ISBN-13: 9781804619476
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}