You're reading from 10 Machine Learning Blueprints You Should Know for Cybersecurity

Product typeBook

Published inMay 2023

PublisherPackt

ISBN-139781804619476

Edition1st Edition

Concepts

Machine Learning

Author (1)

Rajvardhan Oak

Protecting User Privacy with Federated Machine Learning

In recent times, the issue of user privacy has gained traction in the information technology world. Privacy means that the user is in complete control of their data – they can choose how the data is collected, stored, and used. Often, this also implies that data cannot be shared with other entities. Apart from this, there may be other reasons why companies may not want to share data, such as confidentiality, lack of trust, and protecting intellectual property. This can be a huge impediment to machine learning (ML) models; large models, particularly deep neural networks, cannot train properly without adequate data.

In this chapter, we will learn about a privacy-preserving technique for ML known as federated machine learning (FML). Many kinds of fraud data are sensitive; they have user-specific information and also reveal weaknesses in the company’s detection measures. Therefore, companies may not want to share...

Technical requirements

You can find the code files for this chapter on GitHub at https://github.com/PacktPublishing/10-Machine-Learning-Blueprints-You-Should-Know-for-Cybersecurity/tree/main/Chapter%2011.

An introduction to federated machine learning

Let us first look at what federated learning is and why it is a valuable tool. We will first look at privacy challenges that are faced while applying machine learning, followed by how and why we apply federated learning.

Privacy challenges in machine learning

Traditional ML involves a series of steps that we have discussed multiple times so far: data preprocessing, feature extraction, model training, and tuning the model for best performance. However, this involves the data being exposed to the model and, therefore, is based on the premise of the availability of data. The more data we have available, the more accurate the model will be.

However, there is often a scarcity of data in the real world. Labels are hard to come by, and there is no centrally aggregated data source. Rather, data is collected and processed by multiple entities who may not want to share it.

This is true more often than not in the security space. Because...

Implementing federated averaging

In this section, we will implement federated averaging with a practical use case in Python. Note that while we are using the MNIST dataset here as an example, this can easily be replicated for any dataset of your choosing.

Importing libraries

We begin by importing the necessary libraries. We will need our standard Python libraries, along with some libraries from Keras, which will allow us to create our deep learning model. The following code snippet imports these libraries:

import numpy as np
import random
import cv2
from imutils import paths
import os
# SkLearn Libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.utils import shuffle
from sklearn.metrics import accuracy_score
# TensorFlow Libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow...

Reviewing the privacy-utility trade-off in federated learning

In the previous section, we examined the effectiveness of federated learning and looked at the model performance over multiple communication rounds. However, to quantify the effectiveness, we need to compare this against two benchmarks:

A model trained on the entire data with no federation involved
A local model trained on its own data only

The differences in accuracy in these three cases (federated, global only, and local only) will indicate the trade-offs we are making and the gains we achieve. In the previous section, we looked at the accuracy we obtain via federated learning. To understand the utility-privacy trade-off, let us discuss two extreme cases – a fully global and a fully local model.

Global model (no privacy)

When we train a global model directly, we use all the data to train a single model. Thus, all parties involved would be publicly sharing their data with each other. The...

Summary

In this chapter, we learned about a privacy preservation mechanism for ML known as federated learning. In traditional ML, all data is aggregated and processed in a central location, but in FML, the data remains distributed across multiple devices or locations, and the model is trained in a decentralized manner. In FML, we share the model and not the data.

We discussed the core concepts and working of FML, followed by an implementation in Python. We also benchmarked the performance of federated learning against traditional ML approaches to examine the privacy-utility trade-off. This chapter provided an introduction to an important aspect of ML and one that is gaining rapid traction in today’s privacy-centric technology world.

In the next chapter, we will go a step further and look at the hottest topic in ML privacy today – differential privacy.

The rest of the chapter is locked

You have been reading a chapter from

10 Machine Learning Blueprints You Should Know for Cybersecurity

Published in: May 2023Publisher: PacktISBN-13: 9781804619476

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages