Reader small image

You're reading from  R Deep Learning Essentials. - Second Edition

Product typeBook
Published inAug 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788992893
Edition2nd Edition
Languages
Tools
Right arrow
Authors (2):
Mark Hodnett
Mark Hodnett
author image
Mark Hodnett

Mark Hodnett is a data scientist with over 20 years of industry experience in software development, business intelligence systems, and data science. He has worked in a variety of industries, including CRM systems, retail loyalty, IoT systems, and accountancy. He holds a master's in data science and an MBA. He works in Cork, Ireland, as a senior data scientist with AltViz.
Read more about Mark Hodnett

Joshua F. Wiley
Joshua F. Wiley
author image
Joshua F. Wiley

Joshua F. Wiley is a lecturer at Monash University, conducting quantitative research on sleep, stress, and health. He earned his Ph.D. from the University of California, Los Angeles and completed postdoctoral training in primary care and prevention. In statistics and data science, Joshua focuses on biostatistics and is interested in reproducible research and graphical displays of data and statistical models. He develops or co-develops a number of R packages including Varian, a package to conduct Bayesian scale-location structural equation models, and MplusAutomation, a popular package that links R to the commercial Mplus software.
Read more about Joshua F. Wiley

View More author details
Right arrow

Deep neural networks

A deep neural network (DNN) is a neural network with multiple hidden layers. We cannot achieve good results by just increasing the number of nodes in a neural network with a small number of layers (a shallow neural network). A DNN can fit data more accurately with fewer parameters than a shallow neural network (NN), because more layers (each with fewer neurons) give a more efficient and accurate representation. Using multiple hidden layers allows a more sophisticated build-up from simple elements to more complex ones. In the previous example, we considered a neural network that could recognize basic shapes, such as a circle or a square. In a deep neural network, many circles and squares could be combined to form other, more advanced shapes. A shallow neural network cannot build more advanced shapes from basic pieces. The disadvantage of a DNN is that these models are harder to train and prone to overfitting.

If we consider trying to recognize handwritten text from image data, then the raw data is pixel values from an image. The first layer captures simple shapes, such as lines and curves. The next layer uses these simple shapes and recognizes higher abstractions, such as corners and circles. The second layer does not have to directly learn from the pixels, which are noisy and complex. In contrast, a shallow architecture may require far more parameters, as each hidden neuron would have to be capable of going directly from pixels in the image to the target value. It would also not be able to combine features, so for example, if the image data were in a different location (for example, not centered), it would fail to recognize the text.

One of the challenges in training deep neural networks is how to efficiently learn the weights. The models are complex with a huge number of parameters to train. One of the major advancements in deep learning occurred in 2006, when it was shown that deep belief networks (DBNs) could be trained one layer at a time (See Hinton, G. E., Osindero, S., and Teh, Y. W. (2006)). A DBN is a type of deep neural network with multiple hidden layers and connections between (but not within) layers (that is, a neuron in layer 1 may be connected to a neuron in layer 2, but may not be connected to another neuron in layer 1). The restriction of no connections within a layer allows for much faster training algorithms to be used, such as the contrastive divergence algorithm. Essentially, the DBN can then be trained layer by layer; the first hidden layer is trained and used to transform raw data into hidden neurons, which are then treated as a new set of input in the next hidden layer, and the process is repeated until all the layers have been trained.

The benefits of the realization that DBNs could be trained one layer at a time extend beyond just DBNs. DBNs are sometimes used as a pre-training stage for a deep neural network. This allows comparatively fast, greedy, layer-by­-layer training to be used to provide good initial estimates, which are then refined in the deep neural network using other, less efficient, training algorithms, such as back-propagation.

So far we have primarily focused on feed-forward neural networks, where the results from one layer and neuron feed forward to the next. Before closing this section, two specific kinds of deep neural network that have grown in popularity are worth mentioning. The first is a recurrent neural network (RNN), where neurons send feedback signals to each other. These feedback loops allow RNNs to work well with sequences. An example of an application of RNNs is to automatically generate click-bait, such as Top 10 reasons to visit Los Angeles: #6 will shock you! or One trick great hair salons don't want you to know. RNNs work well for such jobs as they can be seeded from a large initial pool of a few words (even just trending search terms or names) and then predict/generate what the next word should be. This process can be repeated a few times until a short phrase is generated, the click-bait. We will see examples of RNNs in Chapter 7, Natural Language Processing using Deep Learning.

The second type is a convolutional neural network (CNN). CNNs are most commonly used in image-recognition. CNNs work by having each neuron respond to overlapping subregions of an image. The benefits of CNNs are that they require comparatively minimal preprocessing but still do not require too many parameters through weight-sharing (for example, across subregions of an image). This is particularly valuable for images as they are often not consistent. For example, imagine ten different people taking a picture of the same desk. Some may be closer or farther away or at positions resulting in essentially the same image having different heights, widths, and the amount of image captured around the focal object. We will cover CNNs in depth in Chapter 5, Image Classification Using Convolutional Neural Networks.

This description only provides the briefest of overviews as to what deep neural networks are and some of the use cases to which they can be applied. The seminal reference for deep learning is Goodfellow-et-al (2016).

Previous PageNext Page
You have been reading a chapter from
R Deep Learning Essentials. - Second Edition
Published in: Aug 2018Publisher: PacktISBN-13: 9781788992893
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Mark Hodnett

Mark Hodnett is a data scientist with over 20 years of industry experience in software development, business intelligence systems, and data science. He has worked in a variety of industries, including CRM systems, retail loyalty, IoT systems, and accountancy. He holds a master's in data science and an MBA. He works in Cork, Ireland, as a senior data scientist with AltViz.
Read more about Mark Hodnett

author image
Joshua F. Wiley

Joshua F. Wiley is a lecturer at Monash University, conducting quantitative research on sleep, stress, and health. He earned his Ph.D. from the University of California, Los Angeles and completed postdoctoral training in primary care and prevention. In statistics and data science, Joshua focuses on biostatistics and is interested in reproducible research and graphical displays of data and statistical models. He develops or co-develops a number of R packages including Varian, a package to conduct Bayesian scale-location structural equation models, and MplusAutomation, a popular package that links R to the commercial Mplus software.
Read more about Joshua F. Wiley