Reader small image

You're reading from  10 Machine Learning Blueprints You Should Know for Cybersecurity

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781804619476
Edition1st Edition
Right arrow
Author (1)
Rajvardhan Oak
Rajvardhan Oak
author image
Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak

Right arrow

Protecting User Privacy with Federated Machine Learning

In recent times, the issue of user privacy has gained traction in the information technology world. Privacy means that the user is in complete control of their data – they can choose how the data is collected, stored, and used. Often, this also implies that data cannot be shared with other entities. Apart from this, there may be other reasons why companies may not want to share data, such as confidentiality, lack of trust, and protecting intellectual property. This can be a huge impediment to machine learning (ML) models; large models, particularly deep neural networks, cannot train properly without adequate data.

In this chapter, we will learn about a privacy-preserving technique for ML known as federated machine learning (FML). Many kinds of fraud data are sensitive; they have user-specific information and also reveal weaknesses in the company’s detection measures. Therefore, companies may not want to share...

Technical requirements

An introduction to federated machine learning

Let us first look at what federated learning is and why it is a valuable tool. We will first look at privacy challenges that are faced while applying machine learning, followed by how and why we apply federated learning.

Privacy challenges in machine learning

Traditional ML involves a series of steps that we have discussed multiple times so far: data preprocessing, feature extraction, model training, and tuning the model for best performance. However, this involves the data being exposed to the model and, therefore, is based on the premise of the availability of data. The more data we have available, the more accurate the model will be.

However, there is often a scarcity of data in the real world. Labels are hard to come by, and there is no centrally aggregated data source. Rather, data is collected and processed by multiple entities who may not want to share it.

This is true more often than not in the security space. Because...

Implementing federated averaging

In this section, we will implement federated averaging with a practical use case in Python. Note that while we are using the MNIST dataset here as an example, this can easily be replicated for any dataset of your choosing.

Importing libraries

We begin by importing the necessary libraries. We will need our standard Python libraries, along with some libraries from Keras, which will allow us to create our deep learning model. The following code snippet imports these libraries:

import numpy as np
import random
import cv2
from imutils import paths
import os
# SkLearn Libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.utils import shuffle
from sklearn.metrics import accuracy_score
# TensorFlow Libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow...

Reviewing the privacy-utility trade-off in federated learning

In the previous section, we examined the effectiveness of federated learning and looked at the model performance over multiple communication rounds. However, to quantify the effectiveness, we need to compare this against two benchmarks:

  • A model trained on the entire data with no federation involved
  • A local model trained on its own data only

The differences in accuracy in these three cases (federated, global only, and local only) will indicate the trade-offs we are making and the gains we achieve. In the previous section, we looked at the accuracy we obtain via federated learning. To understand the utility-privacy trade-off, let us discuss two extreme cases – a fully global and a fully local model.

Global model (no privacy)

When we train a global model directly, we use all the data to train a single model. Thus, all parties involved would be publicly sharing their data with each other. The...

Summary

In this chapter, we learned about a privacy preservation mechanism for ML known as federated learning. In traditional ML, all data is aggregated and processed in a central location, but in FML, the data remains distributed across multiple devices or locations, and the model is trained in a decentralized manner. In FML, we share the model and not the data.

We discussed the core concepts and working of FML, followed by an implementation in Python. We also benchmarked the performance of federated learning against traditional ML approaches to examine the privacy-utility trade-off. This chapter provided an introduction to an important aspect of ML and one that is gaining rapid traction in today’s privacy-centric technology world.

In the next chapter, we will go a step further and look at the hottest topic in ML privacy today – differential privacy.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
10 Machine Learning Blueprints You Should Know for Cybersecurity
Published in: May 2023Publisher: PacktISBN-13: 9781804619476
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak