Reader small image

You're reading from  10 Machine Learning Blueprints You Should Know for Cybersecurity

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781804619476
Edition1st Edition
Right arrow
Author (1)
Rajvardhan Oak
Rajvardhan Oak
author image
Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak

Right arrow

Statistical analysis

In this section, we will try to understand some review data and check whether there are any differences between genuine and fake reviews. We will use the Amazon fake reviews dataset that Amazon has published on Kaggle. It is a set of around 20,000 reviews with associated labels (real or fake) as labeled by domain experts at Amazon.

Exploratory data analysis

We will first load up the data and take a first pass over it to understand the features and their distribution.

We begin by importing the necessary libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

We will then read the reviews data. Although it is a text file, it is structured and therefore can be read with the read_csv function in Pandas:

reviews_df = pd.read_csv("amazon_reviews.txt", sep="\t")
reviews_df.head()

This is what the output should look like:

Figure 4.1 – A glimpse of the reviews dataset

Figure 4.1 – A glimpse of the reviews dataset

...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
10 Machine Learning Blueprints You Should Know for Cybersecurity
Published in: May 2023Publisher: PacktISBN-13: 9781804619476

Author (1)

author image
Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak