Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
10 Machine Learning Blueprints You Should Know for Cybersecurity

You're reading from  10 Machine Learning Blueprints You Should Know for Cybersecurity

Product type Book
Published in May 2023
Publisher Packt
ISBN-13 9781804619476
Pages 330 pages
Edition 1st Edition
Languages
Author (1):
Rajvardhan Oak Rajvardhan Oak
Profile icon Rajvardhan Oak

Table of Contents (15) Chapters

Preface 1. Chapter 1: On Cybersecurity and Machine Learning 2. Chapter 2: Detecting Suspicious Activity 3. Chapter 3: Malware Detection Using Transformers and BERT 4. Chapter 4: Detecting Fake Reviews 5. Chapter 5: Detecting Deepfakes 6. Chapter 6: Detecting Machine-Generated Text 7. Chapter 7: Attributing Authorship and How to Evade It 8. Chapter 8: Detecting Fake News with Graph Neural Networks 9. Chapter 9: Attacking Models with Adversarial Machine Learning 10. Chapter 10: Protecting User Privacy with Differential Privacy 11. Chapter 11: Protecting User Privacy with Federated Machine Learning 12. Chapter 12: Breaking into the Sec-ML Industry 13. Index 14. Other Books You May Enjoy

Statistical analysis

In this section, we will try to understand some review data and check whether there are any differences between genuine and fake reviews. We will use the Amazon fake reviews dataset that Amazon has published on Kaggle. It is a set of around 20,000 reviews with associated labels (real or fake) as labeled by domain experts at Amazon.

Exploratory data analysis

We will first load up the data and take a first pass over it to understand the features and their distribution.

We begin by importing the necessary libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We will then read the reviews data. Although it is a text file, it is structured and therefore can be read with the read_csv function in Pandas:

reviews_df = pd.read_csv("amazon_reviews.txt", sep="\t")
reviews_df.head()

This is what the output should look like:

Figure 4.1 – A glimpse of the reviews dataset

Figure 4.1 – A glimpse of the reviews dataset

...
lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}