Reader small image

You're reading from  Machine Learning with Swift

Product typeBook
Published inFeb 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781787121515
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Jojo Moolayil
Jojo Moolayil
author image
Jojo Moolayil

Jojo Moolayil is a data scientist, living in Bengaluru—the silicon valley of India. With over 4 years of industrial experience in Decision Science and IoT, he has worked with industry leaders on high impact and critical projects across multiple verticals. He is currently associated with GE, the pioneer and leader in data science for Industrial IoT. Jojo was born and raised in Pune, India and graduated from University of Pune with a major in information technology engineering. With a vision to solve problems at scale, Jojo found solace in decision science and learnt to solve a variety of problems across multiple industry verticals early in his career. He started his career with Mu Sigma Inc., the world's largest pure play analytics provider where he worked with the leaders of many fortune 50 clients. With the passion to solve increasingly complex problems, Jojo touch based with Internet of Things and found deep interest in the very promising area of consumer and industrial IoT. One of the early enthusiasts to venture into IoT analytics, Jojo converged his learnings from decision science to bring the problem solving frameworks and his learnings from data and decision science to IoT. To cement his foundations in industrial IoT and scale the impact of the problem solving experiments, he joined a fast growing IoT Analytics startup called Flutura based in Bangalore and headquartered in the valley. Flutura focuses exclusively on Industrial IoT and specializes in analytics for M2M data. It is with Flutura, where Jojo reinforced his problem solving skills for M2M and Industrial IoT while working for the world's leading manufacturing giant and lighting solutions providers. His quest for solving problems at scale brought the 'product' dimension in him naturally and soon he also ventured into developing data science products and platforms. After a short stint with Flutura, Jojo moved on to work with the leaders of Industrial IoT, that is, G.E. in Bangalore, where he focused on solving decision science problems for Industrial IoT use cases. As a part of his role in GE, Jojo also focuses on developing data science and decision science products and platforms for Industrial IoT.
Read more about Jojo Moolayil

Alexander Sosnovshchenko
Alexander Sosnovshchenko
author image
Alexander Sosnovshchenko

Alexander Sosnovshchenko has been working as an iOS software engineer since 2012. Later he made his foray into data science, from the first experiments with mobile machine learning in 2014, to complex deep learning solutions for detecting anomalies in video surveillance data. He lives in Lviv, Ukraine, and has a wife and a daughter.
Read more about Alexander Sosnovshchenko

View More author details
Right arrow

Chapter 12. Optimizing Neural Networks for Mobile Devices

Modern convolutional neural networks can be huge. For example, the pre-trained ResNet family network can be from 100 to 1,000 layers deep, and take from 138 MB to 0.5 GB in Torch data format. To deploy them to mobile or embedded devices can be problematic, especially if your app requires several models for different tasks. Also, CNNs are computationally heavy, and in some settings (for example, real-time video analysis) can drain device battery in no time. Actually, much faster than it took to write this chapter's intro. But why are they so big, and why do they consume so much energy? And how do we fix it without sacrificing accuracy?

As we've already discussed the speed optimization in the previous chapter, we are concentrating on the memory consumption in this chapter. We specifically focus on the deep learning neural networks, but we also give several general recommendations applicable to other kinds of machine learning models.

In...

Delivering perfect user experience


According to the iTunes Connect Developer Guide, the total uncompressed size of the app should be less than 4 GB (as of December 15, 2017); however, this applies only to the binary itself, while asset files can take as much space as the disk capacity allows. There is also a limit on app size for the cellular download, as stated on the Apple Developer site (https://developer.apple.com/news/?id=09192017b):

"We've increased the cellular download limit from 100 MB to 150 MB, letting customers download more apps from the App Store over their cellular network."

The simple conclusion is that you'd better store you model parameters as on-demand resources, or download them from your server after the app is already installed; but this is only one half of the problem. The other half is that you really don't want your app to take a lot of space and consume tons of traffic, because this is a bad user experience.

We can attack the problem from several directions (from the...

Calculating the size of a convolutional neural network


Let's take some well-known CNN, say VGG16, and see in detail how exactly the memory is being spent. You can print the summary of it using Keras:

from keras.applications import VGG16
model = VGG16()
print(model.summary())

The network consists of 13 2D-convolutional layers (with 3×3 filters, stride 1 and pad 1) and 3 fully connected layers ("Dense"). Plus, there are an input layer, 5 max-pooling layers and a flatten layer, which do not hold parameters.

Lossless compression


A typical neural network contains a significant amount of redundant information. This enables us to apply both lossless and lossy compression to them, and often achieve fairly good results.

Huffman encoding is a type of compression that is commonly referred to in research papers concerning CNN compression. You can also use Apple compression or Facebook zstd libraries, which deliver state-of-the-art compression. Apple compression contains four compression algorithms (three common and one Apple-specific):

  • LZ4 is the fastest of the four.
  • ZLIB is standard zip archiving.
  • LZMA is slower but delivers the best compression.
  • LZFSE is a bit faster and delivers slightly better compression than ZLIB. It is optimized for the Apple hardware to be energy efficient.

Here is a code snippet for you to compress data using the LZFSE algorithm from the compression library, and decompress it back. You can find the full code in the Compression.playground:

import Compression 
let data = ... 

sourceSize...

Compact CNN architectures


During the inference, the whole neural network should be loaded into the memory, so as mobile developers we are especially interested in the small architectures, which consume as little memory as possible. Small neural networks also allow to reduce the bandwidth consumption when downloaded from the network.

Several architectures designed to reduce the size of convolutional neural networks have been proposed recently. We will discuss in brief several most known of them.

SqueezeNet

The architecture was proposed by Iandola et al. in 2017 for use in autonomous cars. As the baseline, researchers took the AlexNet architecture. This network takes 240 MB of memory, which is pretty much the equivalent of mobile devices. SqueezeNet has 50x fewer parameters, and achieves the same level of accuracy on the ImageNet dataset. Using additional compression, its size can be reduced to about 0.5 MB.

SqueezeNet is built from the fire modules. The objective was to create a neural network...

Preventing a neural network from growing big


To leverage cutting-edge deep learning networks on mobile platforms, it becomes extremely important to effectively tune the learning of a network such that we can do the most with the least resources. The implementation of the neural network for OCR by the Google Translate team is an interesting one to understand the few thumb rules to circumvent the network from growing too big.

Following are excerpts from the press release from Google, found at: https://translate.googleblog.com/2015/07/how-google-translate-squeezes-deep.html:

"We needed to develop a very small neural net, and put severe limits on how much we tried to teach it-in essence, put an upper bound on the density of information it handles. The challenge here was in creating the most effective training data. Since we're generating our own training data, we put a lot of effort into including just the right data and nothing more. For instance, we want to be able to recognize a letter with...

Lossy compression


All lossy methods of compression involve a potential problem: when you lose part of the information from your model, you should check how it performs after this. Retraining on the compressed model will help to adapt the network to the new constraints.

Network optimization techniques include:

  • Weight quantization: Change computation precision. For example, the model can be trained in full precision (float32) and then compressed to int8. This improves the performance significantly.
  • Weight pruning
  • Weight decomposition
  • Low rank approximation. Good approach for CPU.
  • Knowledge distillation: Train a smaller model to predict an output of the bigger one.
  • Dynamic memory allocation
  • Layer and tensor fusion. The idea is to combine successive layers into one. This reduces the memory needed to store intermediate results.

At the moment, each of them has its own pros and cons, but no doubts, that more perfect techniques will be proposed in the closest future.

  • Kernel auto-tuning: Optimizes execution...

An example of the network compression


You can find suitable examples of the network compression at the following address:

https://github.com/caffe2/caffe2/issues/472

Summary


There are several ways in which we can achieve appropriate size for deep neural network deployment on mobile platforms. So far, the most popular are choosing the compact architecture, and lossy compression: quantization, pruning, and others. Make sure to check your network's accuracy hasn't degraded after the compression was applied.

Bibliography


  1. O. Good, How Google Translate squeezes deep learning onto a phone, July 29, 2015: https://research.googleblog.com/2015/07/how-google-translate-squeezes-deep.html
  2. Y. LeCun, J. S. Denker, S. A. Solla, R. E. Howard, and L. D. Jackel. Optimal Brain Damage. In NIPS, volume 2, pages 598–605, 1989
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning with Swift
Published in: Feb 2018Publisher: PacktISBN-13: 9781787121515
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Jojo Moolayil

Jojo Moolayil is a data scientist, living in Bengaluru—the silicon valley of India. With over 4 years of industrial experience in Decision Science and IoT, he has worked with industry leaders on high impact and critical projects across multiple verticals. He is currently associated with GE, the pioneer and leader in data science for Industrial IoT. Jojo was born and raised in Pune, India and graduated from University of Pune with a major in information technology engineering. With a vision to solve problems at scale, Jojo found solace in decision science and learnt to solve a variety of problems across multiple industry verticals early in his career. He started his career with Mu Sigma Inc., the world's largest pure play analytics provider where he worked with the leaders of many fortune 50 clients. With the passion to solve increasingly complex problems, Jojo touch based with Internet of Things and found deep interest in the very promising area of consumer and industrial IoT. One of the early enthusiasts to venture into IoT analytics, Jojo converged his learnings from decision science to bring the problem solving frameworks and his learnings from data and decision science to IoT. To cement his foundations in industrial IoT and scale the impact of the problem solving experiments, he joined a fast growing IoT Analytics startup called Flutura based in Bangalore and headquartered in the valley. Flutura focuses exclusively on Industrial IoT and specializes in analytics for M2M data. It is with Flutura, where Jojo reinforced his problem solving skills for M2M and Industrial IoT while working for the world's leading manufacturing giant and lighting solutions providers. His quest for solving problems at scale brought the 'product' dimension in him naturally and soon he also ventured into developing data science products and platforms. After a short stint with Flutura, Jojo moved on to work with the leaders of Industrial IoT, that is, G.E. in Bangalore, where he focused on solving decision science problems for Industrial IoT use cases. As a part of his role in GE, Jojo also focuses on developing data science and decision science products and platforms for Industrial IoT.
Read more about Jojo Moolayil

author image
Alexander Sosnovshchenko

Alexander Sosnovshchenko has been working as an iOS software engineer since 2012. Later he made his foray into data science, from the first experiments with mobile machine learning in 2014, to complex deep learning solutions for detecting anomalies in video surveillance data. He lives in Lviv, Ukraine, and has a wife and a daughter.
Read more about Alexander Sosnovshchenko

Layer

Output shape

Data memory

Parameters

Number of parameters 

InputLayer

224×224×3

150528

0

0

Conv2D

224×224×64

3211264

3×3×3×64+64

1792

Conv2D

224×224×64

3211264

3×3×64×64+64

36928

MaxPool2D

112×112×64

802816

0

0

Conv2D

112×112×128

1605632

3×3×64×128+128

73856

Conv2D

112×112×128

1605632

3×3×128×128+128

147584

MaxPool2D

56×56×128

401408

0

0

Conv2D

56×56×256

802816

3×3×128×256+256

295168

Conv2D

56×56×256

802816

3×3×256×256+256

590080

Conv2D

56×56×256

802816

3×3×256×256+256

590080

MaxPool2D

28×28×256

200704

0

0

Conv2D

28×28×512

401408

3×3×256×512+512

1180160...