Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Python Deep Learning

You're reading from  Python Deep Learning

Product type Book
Published in Apr 2017
Publisher Packt
ISBN-13 9781786464453
Pages 406 pages
Edition 1st Edition
Languages
Authors (4):
Valentino Zocca Valentino Zocca
Profile icon Valentino Zocca
Gianmario Spacagna Gianmario Spacagna
Profile icon Gianmario Spacagna
Daniel Slater Daniel Slater
Profile icon Daniel Slater
Peter Roelants Peter Roelants
Profile icon Peter Roelants
View More author details

Table of Contents (18) Chapters

Python Deep Learning
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
Machine Learning – An Introduction Neural Networks Deep Learning Fundamentals Unsupervised Feature Learning Image Recognition Recurrent Neural Networks and Language Models Deep Learning for Board Games Deep Learning for Computer Games Anomaly Detection Building a Production-Ready Intrusion Detection System Index

Chapter 10. Building a Production-Ready Intrusion Detection System

In the previous chapter, we explained in detail what an anomaly detection is and how it can be implemented using auto-encoders. We proposed a semi-supervised approach for novelty detection. We introduced H2O and showed a couple of examples (MNIST digit recognition and ECG pulse signals) implemented on top of the framework and running in local mode. Those examples used a small dataset already cleaned and prepared to be used as proof-of-concept.

Real-world data and enterprise environments work very differently. In this chapter, we will leverage H2O and general common practices to build a scalable distributed system ready for deployment in production.

We will use as an example an intrusion detection system with the goal of detecting intrusions and attacks in a network environment.

We will raise a few practical and technical issues that you would probably face in building a data product for intrusion detection.

In particular, you...

What is a data product?


The final goal in data science is to solve problems by adopting data-intensive solutions. The focus is not only on answering questions but also on satisfying business requirements.

Just building data-driven solutions is not enough. Nowadays, any app or website is powered by data. Building a web platform for listing items on sale does consume data but is not necessarily a data product.

Mike Loukides gives an excellent definition:

A data application acquires its value from the data itself, and creates more data as a result; it's not just an application with data; it's a data product. Data science enables the creation of data products.

From "What is Data Science" (https://www.oreilly.com/ideas/what-is-data-science)

The fundamental requirement is that the system is able to derive value from data—not just consuming it as it is—and generate knowledge (in the form of data or insights) as output. A data product is the automation that let you extract information from raw data,...

Training


Training a network means having already designed its topology. For that purpose we recommend the corresponding Auto-Encoder section in Chapter 4, Unsupervised Feature Learning for design guidelines according to the type of input data and expected use cases.

Once we have defined the topology of the neural network, we are just at the starting point. The model now needs to be fitted during the training phase. We will see a few techniques for scaling and accelerating the learning of our training algorithm that are very suitable for production environments with large datasets.

Weights initialization

The final convergence of neural networks can be strongly influenced by the initial weights. Depending on which activation function we have selected, we would like to have a gradient with a steep slope in the first iterations so that the gradient descent algorithm can quickly jump into the optimum area.

For a hidden unit j in the first layer (directly connected to the input layer), the sum of...

Testing


Before we discuss what testing means in data science, let's summarize a few concepts.

Firstly and in general, what is a model in science? We can cite the following definitions:

In science, a model is a representation of an idea, an object or even a process or a system that is used to describe and explain phenomena that cannot be experienced directly.

Scientific Modelling, Science Learning Hub, http://sciencelearn.org.nz/Contexts/The-Noisy-Reef/Science-Ideas-and-Concepts/Scientific-modelling

And this:

A scientific model is a conceptual, mathematical or physical representation of a real-world phenomenon. A model is generally constructed for an object or process when it is at least partially understood, but difficult to observe directly. Examples include sticks and balls representing molecules, mathematical models of planetary movements or conceptual principles like the ideal gas law. Because of the infinite variations actually found in nature, all but the simplest and most vague models...

Model validation


The goal of model validation is to evaluate whether the numerical results quantifying the hypothesized estimations/predictions of the trained model are acceptable descriptions of an independent dataset. The main reason is that any measure on the training set would be biased and optimistic since the model has already seen those observations. If we don't have a different dataset for validation, we can hold one fold of the data out from training and use it as benchmark. Another common technique is the cross-fold validation, and its stratified version, where the whole historical dataset is split into multiple folds. For simplicity, we will discuss the hold-one-out method; the same criteria apply also to the cross-fold validation.

The splitting into training and validation set cannot be purely random. The validation set should represent the future hypothetical scenario in which we will use the model for scoring. It is important not to contaminate the validation set with information...

Hyper-parameters tuning


Following the design of our deep neural network according to the previous sections, we would end up with a bunch of parameters to tune. Some of them have default or recommended values and do not require expensive fine-tuning. Others strongly depends on the underlying data, specific application domain, and a set of other components. Thus, the only way to find best values is to perform a model selection by validating based on the desired metric computed on the validation data fold.

Now we will list a table of parameters that we might want to consider tuning. Please consider that each library or framework may have additional parameters and a custom way of setting them. This table is derived from the available tuning options in H2O. It summarizes the common parameters, but not all of them, when building a deep auto-encoder network in production:

End-to-end evaluation


From a business point of view what really matters is the final end-to-end performance. None of your stakeholders will be interested in your training error, parameters tuning, model selection, and so on. What matters is the KPIs to compute on top of the final model. Evaluation can be seen as the ultimate verdict.

Also, as we anticipated, evaluating a product cannot be done with a single metric. Generally, it is a good and effective practice to build an internal dashboard that can report, or measure in real-time, a bunch of performance indicators of our product in the form of aggregated numbers or easy-to-interpret visualization charts. Within a single glance, we would like to understand the whole picture and translate it in the value we are generating within the business.

The evaluation phase can, and generally does, include the same methodology as the model validation. We have seen in previous sections a few techniques for validating in case of labeled and unlabeled data...

Deployment


At this stage, we should have done almost all of the analysis and development needed for building an anomaly detector, or in general a data product using deep learning.

We are only left with final, but not less important, step: the deployment.

Deployment is generally very specific of the use case and enterprise infrastructure. In this section, we will cover some common approaches used in general data science production systems.

POJO model export

In the Testing section, we summarized all the different entities in a machine learning pipeline. In particular, we have seen the definition and differences of a model, a fitted model and the learning algorithm. After we have trained, validated, and selected the final model, we have a final fitted version of it ready to be used. During the testing phase (except in A/B testing), we have scored only historical data that was generally already available in the machines where we trained the model.

In enterprise architectures, it is common to have...

Summary


In this chapter, we went through a long journey of optimizations, tweaks, testing strategies, and engineering practices to turn our neural network into an intrusion detection data product.

In particular, we defined a data product as a system that extracts value from raw data and returns actionable knowledge as output.

We saw a few optimizations for training a deep neural network to be faster, scalable, and more robust. We addressed the problem of early saturation via weights initialization. Scalability using both a parallel multi-threading version of SGD and a distributed implementation in Map/Reduce. We saw how the H2O framework can leverage Apache Spark as the backend for computation via Sparkling Water.

We remarked the importance of testing and the difference between model validation and full end-to-end evaluation. Model validation is used to reject or accept a given model, or to select the best performing one. Likely, model validation metrics can be used for hyper-parameter tuning...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Python Deep Learning
Published in: Apr 2017 Publisher: Packt ISBN-13: 9781786464453
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}

Parameter

Description

Recommended value(s)

activation

The differentiable activation function.

Depends on the data nature...