Reader small image

You're reading from  Java Deep Learning Cookbook

Product typeBook
Published inNov 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781788995207
Edition1st Edition
Languages
Right arrow
Author (1)
Rahul Raj
Rahul Raj
author image
Rahul Raj

Rahul Raj has more than 7 years of IT industry experience in software development, business analysis, client communication, and consulting on medium-/large-scale projects in multiple domains. Currently, he works as a lead software engineer in a top software development firm. He has extensive experience in development activities comprising requirement analysis, design, coding, implementation, code review, testing, user training, and enhancements. He has written a number of articles about neural networks in Java and they are featured by DL4J/ official Java community channels. He is also a certified machine learning professional, certified by Vskills, the largest government certification body in India.
Read more about Rahul Raj

Right arrow

Developing Applications in a Distributed Environment

As the demand increases regarding the quantity of data and resource requirements for parallel computations, legacy approaches may not perform well. So far, we have seen how big data development has become famous and is the most followed approach by enterprises due to the same reasons. DL4J supports neural network training, evaluation, and inference on distributed clusters.

Modern approaches to heavy training, or output generation tasks, distribute training effort across multiple machines. This also brings additional challenges. We need to ensure that we have the following constraints checked before we use Spark to perform distributed training/evaluation/inference:

  • Our data should be significantly large enough to justify the need for distributed clusters. Small network/data on Spark doesn't really gain any performance improvements...

Technical requirements

The source code for this chapter can be found at https://github.com/PacktPublishing/Java-Deep-Learning-Cookbook/tree/master/10_Developing_applications_in_distributed_environment/sourceCode/cookbookapp/src/main/java/com/javacookbook/app.

After cloning our GitHub repository, navigate to the Java-Deep-Learning-Cookbook/10_Developing_applications_in_distributed_environment/sourceCode directory. Then, import the cookbookapp project as a Maven project by importing the pom.xml file.

You need to run either of the following preprocessor scripts (PreProcessLocal.java or PreProcessSpark.java) before running the actual source code:

Setting up DL4J and the required dependencies

We are discussing setting up DL4J again because we are now dealing with a distributed environment. For demonstration purposes, we will use Spark's local mode. Due to this, we can focus on DL4J rather than setting up clusters, worker nodes, and so on. In this recipe, we will set up a single node Spark cluster (Spark local), as well as configure DL4J-specific dependencies.

Getting ready

In order to demonstrate the use of a distributed neural network, you will need the following:

  • A distributed filesystem (Hadoop) for file management
  • Distributed computing (Spark) in order to process big data
...

Creating an uber-JAR for training

The training job that's executed by spark-submit will need to resolve all the required dependencies at runtime. In order to manage this task, we will create an uber-JAR that has the application runtime and its required dependencies. We will use the Maven configurations in pom.xml to create an uber-JAR so that we can perform distributed training. Effectively, we will create an uber-JAR and submit it to spark-submit to perform the training job in Spark.

In this recipe, we will create an uber-JAR using the Maven shade plugin for Spark training.

How to do it...

  1. Create an uber-JAR (shaded JAR) by adding the Maven shade plugin to the pom.xml file, as shown here:

Refer to the pom.xml file...

CPU/GPU-specific configuration for training

Hardware-specific changes are generic configurations that can't be ignored in a distributed environment. DL4J supports GPU-accelerated training in NVIDIA GPUs with CUDA/cuDNN enabled. We can also perform Spark distributed training using GPUs.

In this recipe, we will configure CPU/GPU-specific changes.

How to do it...

  1. Download, install, and set up the CUDA toolkit from https://developer.nvidia.com/cuda-downloads. OS-specific setup instructions are available at the NVIDIA CUDA official website.
  2. Configure the GPU for Spark distributed training by adding a Maven dependency for ND4J's CUDA backend:
<dependency>
<groupId>org.nd4j</groupId>
<artifactId...

Memory settings and garbage collection for Spark

Memory management is very crucial for distributed training with large datasets in production. It directly influences the resource consumption and performance of the neural network. Memory management involves configuring off-heap and on-heap memory spaces. DL4J/ND4J-specific memory configuration will be discussed in detail in Chapter 12, Benchmarking and Neural Network Optimization.

In this recipe, we will focus on memory configuration in the context of Spark.

How to do it...

  1. Add the --executor-memory command-line argument while submitting a job to spark-submit to set on-heap memory for the worker node. For example, we could use --executor-memory 4g to allocate 4 GB of memory...

Configuring encoding thresholds

The DL4J Spark implementation makes use of a threshold encoding scheme to perform parameter updates across nodes in order to reduce the commuted message size across the network and thereby reduce the cost of traffic. The threshold encoding scheme introduces a new distributed training-specific hyperparameter called encoding threshold.

In this recipe, we will configure the threshold algorithm in a distributed training implementation.

How to do it...

  1. Configure the threshold algorithm in SharedTrainingMaster:
TrainingMaster tm = new SharedTrainingMaster.Builder(voidConfiguration, minibatchSize)
.thresholdAlgorithm(new AdaptiveThresholdAlgorithm(gradientThreshold))
.build();
  1. Configure the...

Performing a distributed test set evaluation

There are challenges involved in distributed neural network training. Some of these challenges include managing different hardware dependencies across master and worker nodes, configuring distributed training to produce good performance, memory benchmarks across the distributed clusters, and more. We discussed some of those concerns in the previous recipes. While keeping such configurations in place, we'll move on to the actual distributed training/evaluation. In this recipe, we will perform the following tasks:

  • ETL for DL4J Spark training
  • Create a neural network for Spark training
  • Perform a test set evaluation

How to do it...

  1. Download, extract, and copy the contents of...

Saving and loading trained neural network models

Training the neural network over and over to perform an evaluation is not a good idea since training is a very costly operation. This is why model persistence is important in distributed systems as well.

In this recipe, we will persist the distributed neural network models to disk and load them for further use.

How to do it...

  1. Save the distributed neural network model using ModelSerializer:
MultiLayerNetwork model = sparkModel.getNetwork();
File file = new File("MySparkMultiLayerNetwork.bin");
ModelSerializer.writeModel(model,file, saveUpdater);
  1. Save the distributed neural network model using save():
MultiLayerNetwork model = sparkModel.getNetwork();
File locationToSave...

Performing distributed inference

In this chapter, we have discussed how to perform distributed training using DL4J. We have also performed distributed evaluation to evaluate the trained distributed model. Now, let's discuss how to utilize the distributed model to solve use cases such as predictions. This is referred to as inference. Let's go over how we can perform distributed inference in a Spark environment.

In this recipe, we will perform distributed inference on Spark using DL4J.

How to do it...

  1. Perform distributed inference for SparkDl4jMultiLayer by calling feedForwardWithKey(), as shown here:
SparkDl4jMultiLayer.feedForwardWithKey(JavaPairRDD<K, INDArray> featuresData, int batchSize);
  1. Perform...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Java Deep Learning Cookbook
Published in: Nov 2019Publisher: PacktISBN-13: 9781788995207
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rahul Raj

Rahul Raj has more than 7 years of IT industry experience in software development, business analysis, client communication, and consulting on medium-/large-scale projects in multiple domains. Currently, he works as a lead software engineer in a top software development firm. He has extensive experience in development activities comprising requirement analysis, design, coding, implementation, code review, testing, user training, and enhancements. He has written a number of articles about neural networks in Java and they are featured by DL4J/ official Java community channels. He is also a certified machine learning professional, certified by Vskills, the largest government certification body in India.
Read more about Rahul Raj