Packt+ | Advance your knowledge in tech

You're reading from Deep Learning with Theano

Product typeBook

Published inJul 2017

PublisherPackt

ISBN-139781786465825

Edition1st Edition

Tools

Theano

Concepts

Deep Learning

Author (1)

Christopher Bourez

Chapter 6. Locating with Spatial Transformer Networks

In this chapter, the NLP field is left to come back to images, and get an example of application of recurrent neural networks to images. In Chapter 2, Classifying Handwritten Digits with a Feedforward Network we addressed the case of image classification, consisting of predicting the class of an image. Here, we'll address object localization, a common task in computer vision as well, consisting of predicting the bounding box of an object in the image.

While Chapter 2, Classifying Handwritten Digits with a Feedforward Network solved the classification task with neural nets built with linear layers, convolutions, and non-linarites, the spatial transformer is a new module built on very specific equations dedicated to the localization task.

In order to locate multiple objects in the image, spatial transformers are composed with recurrent networks. This chapter takes the opportunity to show how to use prebuilt recurrent networks in Lasagne,...

MNIST CNN model with Lasagne

The Lasagne library has packaged layers and tools to handle neural nets easily. Let's first install the latest version of Lasagne:

pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip

Let us reprogram the MNIST model from Chapter 2, Classifying Handwritten Digits with a Feedforward Network with Lasagne:

def model(l_input, input_dim=28, num_units=256, num_classes=10, p=.5):


    network = lasagne.layers.Conv2DLayer(
            l_input, num_filters=32, filter_size=(5, 5),
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())

    network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

    network = lasagne.layers.Conv2DLayer(
            network, num_filters=32, filter_size=(5, 5),
            nonlinearity=lasagne.nonlinearities.rectify)

    network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

    if num_units > 0:
        network = lasagne.layers.DenseLayer(
...

A localization network

In Spatial Transformer Networks (STN), instead of applying the network directly to the input image signal, the idea is to add a module to preprocess the image and crop it, rotate it, and scale it to fit the object, to assist in classification:

Spatial Transformer Networks

For that purpose, STNs use a localization network to predict the affine transformation parameters and process the input:

Spatial transformer networks

In Theano, differentiation through the affine transformation is automatic, we simply have to connect the localization net with the input of the classification net through the affine transformation.

First, we create a localization network not very far from the MNIST CNN model, to predict six parameters of the affine transformation:

l_in = lasagne.layers.InputLayer((None, dim, dim))
l_dim = lasagne.layers.DimshuffleLayer(l_in, (0, 'x', 1, 2))
l_pool0_loc = lasagne.layers.MaxPool2DLayer(l_dim, pool_size=(2, 2))
l_dense_loc = mnist_cnn.model(l_pool0_loc, input_dim...

Unsupervised learning with co-localization

The first layers of the digit classifier trained in Chapter 2, Classifying Handwritten Digits with a Feedforward Network as an encoding function to represent the image in an embedding space, as for words:

It is possible to train unsurprisingly the localization network of the spatial transformer network by minimizing the hinge loss objective function on random sets of two images supposed to contain the same digit:

Minimizing this sum leads to modifying the weights in the localization network, so that two localized digits become closer than two random crops.

Here are the results:

(Spatial transformer networks paper, Jaderberg et al., 2015)

Region-based localization networks

Historically, the basic approach in object localization was to use a classification network in a sliding window; it consists of sliding a window one pixel by one pixel in each direction and applying a classifier at each position and each scale in the image. The classifier learns to say if the object is present and centered. It requires a large amount of computations since the model has to be evaluated at every position and scale.

To accelerate such a process, the Region Proposal Network (RPN) in the Fast-R-CNN paper from the researcher Ross Girshick consists of transforming the fully connected layers of a neural net classifier such as MNIST CNN into convolutional layers as well; in fact, network dense on 28x28 image, there is no difference between a convolution and a linear layer when the convolution kernel has the same dimensions as the input. So, any fully connected layers can be rewritten as convolutional layers, with the same weights and the appropriate...

Summary

The spatial transformer layer is an original module to localize an area of the image, crop it and resize it to help the classifier focus on the relevant part in the image, and increase its accuracy. The layer is composed of differentiable affine transformation, for which the parameters are computed through another model, the localization network, and can be learned via backpropagation as usual.

An example of the application to reading multiple digits in an image can be inferred with the use of recurrent neural units. To simplify our work, the Lasagne library was introduced.

Spatial transformers are one solution among many others for localizations; region-based localizations, such as YOLO, SSD, or Faster RCNN, provide state-of-the-art results for bounding box prediction.

In the next chapter, we'll continue with image recognition to discover how to classify full size images that contain a lot more information than digits, such as natural images of indoor scenes and outdoor landscapes...

The rest of the chapter is locked

You have been reading a chapter from

Deep Learning with Theano

Published in: Jul 2017Publisher: PacktISBN-13: 9781786465825

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Christopher Bourez

Christopher Bourez graduated from Ecole Polytechnique and Ecole Normale Suprieure de Cachan in Paris in 2005 with a Master of Science in Math, Machine Learning and Computer Vision (MVA). For 7 years, he led a company in computer vision that launched Pixee, a visual recognition application for iPhone in 2007, with the major movie theater brand, the city of Paris and the major ticket broker: with a snap of a picture, the user could get information about events, products, and access to purchase. While working on missions in computer vision with Caffe, TensorFlow or Torch, he helped other developers succeed by writing on a blog on computer science. One of his blog posts, a tutorial on the Caffe deep learning technology, has become the most successful tutorial on the web after the official Caffe website. On the initiative of Packt Publishing, the same recipes that made the success of his Caffe tutorial have been ported to write this book on Theano technology. In the meantime, a wide range of problems for Deep Learning are studied to gain more practice with Theano and its application.
Read more about Christopher Bourez

Other recommended products

Related to this chapter

Python High Performance

Python is a versatile language that has found applications in many industries. The clean syntax, rich standard library, and vast selection of third-party libraries makes Python a wildly popular language.

BookMay 2017270 pages

Recurrent Neural Networks with Python Quick Start Guide

Developers struggle to find an easy to follow learning resource for implementing Recurrent Neural Network(RNN) models. RNNs are the state-of-the-art model in deep learning for dealing with sequential data. From language translation to generating captions for an image, RNNs are used to continuously improve the results. This book will teach you the fundamentals of RNNs with example applications in Python and the TensorFlow library. The examples are accompanied by the right combination of theoretical knowledge and real-world implementations of concepts to build a solid foundation of neural network modeling.

BookNov 2018122 pages

Deep Learning Essentials

Deep Learning is one of the trending topics in the field of Artificial Intelligence today and can be considered to be an advanced form of machine learning. This book will help you take your first steps when it comes to training efficient deep learning models, and apply them in various practical scenarios. You will model, train and deploy different kinds of neural networks such as Convolutional Neural Network, Recurrent Neural Network, and see their applications in real-world domains such as computer vision, natural language processing, and speech recognition. This book also covers solutions to tackle different problems you might come across while training your models and ensure their high performance. This book does not assume any prior knowledge of deep learning. By the end of this book, you will have a firm understanding of the basics of deep learning and neural network modeling, along with their practical applications.

BookJan 2018284 pages3

Natural Language Processing with TensorFlow

TensorFlow is the leading framework for deep learning algorithms critical to artificial intelligence, and natural language processing (NLP) makes much of the data used by deep learning applications accessible to them. This book brings the two together and teaches deep learning developers how to work with today’s vast amount of unstructured data.

BookMay 2018472 pages

Practical Convolutional Neural Networks

This book helps you master CNN, from the basics to the most advanced concepts in CNN such as GANs, instance classification and attention mechanism for vision models and more. You will implement advanced CNN models using complex image and video datasets. By the end of the book you will learn CNN’s best practices to implement smart ConvNet models and apply them to solve complex deep learning problems.

BookFeb 2018218 pages

Keras Deep Learning Cookbook

This book gives you a practical, hands-on understanding of how you can leverage the power of Python and Keras to perform effective deep learning. It presents a unique problem-solution approach to tackle various problems in training different types of neural networks while taking care of the speed and accuracy of these models

BookOct 2018252 pages

PyTorch Deep Learning Hands-On

PyTorch Deep Learning Hands-On is a book for engineers who want a fast-paced guide to doing deep learning work with Pytorch. It is not an academic textbook and does not try to teach deep learning principles. The book will help you most if you want to get your hands dirty and put PyTorch to work quickly.

BookApr 2019250 pages

Hands-On Computer Vision with TensorFlow 2

Computer vision is achieving a new frontier of capabilities in fields like health, automobile or robotics. This book explores TensorFlow 2, Google's open-source AI framework, and teaches how to leverage deep neural networks for visual tasks. It will help you acquire the insight and skills to be a part of the exciting advances in computer vision.

BookMay 2019372 pages

Mastering Computer Vision with TensorFlow 2.x

You will learn the principles of computer vision and deep learning, and understand various models and architectures with their pros and cons. You will learn how to use TensorFlow 2.x to build your own neural network model and apply it to various computer vision tasks such as image acquiring, processing, and analyzing.

BookMay 2020430 pages

Deep Learning with Keras

Keras is a high-level neural network library written in Python that runs on top of either Theano or TensorFlow. With this book, you’ll learn the basics of Keras in a highly practical way and understand how this minimal, highly modular framework runs on both CPU and GPU, allowing you to put your ideas into action in the shortest possible time.

BookApr 2017318 pages

Advanced Deep Learning with Python

This book is an expert-level guide to master the neural network variants using the Python ecosystem. You will gain the skills to build smarter, faster, and efficient deep learning systems with practical examples. By the end of this book, you will be up to date with the latest advances and current researches in the deep learning domain.

BookDec 2019468 pages

Hands-On Generative Adversarial Networks with Keras

This book will explore deep learning and generative models, and their applications in artificial intelligence. You will learn to evaluate and improve your GAN models by eliminating challenges that are encountered in real-world applications. You will implement GAN architectures in various domains such as computer vision, NLP, and audio processing

BookMay 2019272 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Deep Learning with Theano

Chapter 6. Locating with Spatial Transformer Networks

MNIST CNN model with Lasagne

A localization network

Unsupervised learning with co-localization

Region-based localization networks

Further reading

Summary

Unlock this book and the full library FREE for 7 days

Author (1)

Python High Performance

Python is a versatile language that has found applications in many industries. The clean syntax, rich standard library, and vast selection of third-party libraries makes Python a wildly popular language.

Recurrent Neural Networks with Python Quick Start Guide

Deep Learning Essentials

Natural Language Processing with TensorFlow

Practical Convolutional Neural Networks

Keras Deep Learning Cookbook

PyTorch Deep Learning Hands-On

Hands-On Computer Vision with TensorFlow 2

Mastering Computer Vision with TensorFlow 2.x

Deep Learning with Keras

Advanced Deep Learning with Python

Hands-On Generative Adversarial Networks with Keras

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook