Packt+ | Advance your knowledge in tech

You're reading from Deep Learning with Hadoop

Product typeBook

Published inFeb 2017

Reading LevelIntermediate

PublisherPackt

ISBN-139781787124769

Edition1st Edition

Languages

Java

Tools

Deeplearning4j Hadoop

Concepts

Deep Learning

Author (1)

Dipayan Dev

Chapter 7. Miscellaneous Deep Learning Operations using Hadoop

	"In pioneer days they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers."
	--Grace Hopper

So far in this book, we discussed various deep neural network models and their concepts, applications, and implementation of the models in distributed environments. We have also explained why it is difficult for a centralized computer to store and process vast amounts of data and extract information using these models. Hadoop has been used to overcome the limitations caused by large-scale data.

As we have now reached the final chapter of this book, we will mainly discuss the design of the three most commonly used machine learning applications. We will explain the general concept of large-scale video processing, large-scale image processing, and natural language processing using the Hadoop framework.

The organization...

Distributed video decoding in Hadoop

Most of the popular video compression formats, such as MPEG-2 and MPEG-4, follow a hierarchical structure in the bit-stream. In this subsection, we will assume that the compression format used has a hierarchical structure for its bit-stream. For simplicity, we have divided the decoding task into two different Map-reduce jobs:

Extraction of video sequence level information: From the outset, it can be easily predicted that the header information of all the video dataset can be found in the first block of the dataset. In this phase, the aim of the map-reduce job is to collect the sequence level information from the first block of the video dataset and output the result as a text file in the HDFS. The sequence header information is needed to set the format for the decoder object.
For the video files, a new FileInputFormat should be implemented with its own record reader. Each record reader will then provide a <key, value> pair in this format to each...

Large-scale image processing using Hadoop

We have already mentioned in the earlier chapters how the size and volume of images are increasing day by day; the need to store and process these vast amount of images is difficult for centralized computers. Let's consider an example to get a practical idea of such situations. Let's take a large-scale image of size 81025 pixels by 86273 pixels. Each pixel is composed of three values:red, green, and blue. Consider that, to store each of these values, a 32-bit precision floating point number is required. Therefore, the total memory consumption of that image can be calculated as follows:

86273 * 81025 * 3 * 32 bits = 78.12 GB

Leave aside doing any post processing on this image, as it can be clearly concluded that it is impossible for a traditional computer to even store this amount of data in its main memory. Even though some advanced computers come with higher configurations, given the return on investment, most companies do not opt for these computers...

Natural language processing using Hadoop

The exponential growth of information in the Web has increased the intensity of diffusion of large-scale unstructured natural language textual resources. Hence, in the last few years, the interest to extract, process, and share this information has increased substantially. Processing these sources of knowledge within a stipulated time frame has turned out to be a major challenge for various research and commercial industries. In this section, we will describe the process used to crawl the web documents, discover the information and run natural language processing in a distributed manner using Hadoop.

To design architecture for natural language processing (NLP), the first task to be performed is the extraction of annotated keywords and key phrases from the large-scale unstructured data. To perform the NLP on a distributed architecture, the Apache Hadoop framework can be chosen for its efficient and scalable solution, and also to improve the failure...

Summary

This chapter discussed the most widely used applications of Machine learning and how they can be designed in the Hadoop framework. First, we started with a large video set and showed how the video can be decoded in the HDFS and later converted into a sequence file containing images for later processing. Large-scale image processing was discussed next in the chapter. The mapper used for this purpose has a shell script which performs all the tasks necessary. So, no reducer is necessary to perform this operation. Finally, we discussed how the natural language processing model can be deployed in Hadoop.

The rest of the chapter is locked

You have been reading a chapter from

Deep Learning with Hadoop

Published in: Feb 2017Publisher: PacktISBN-13: 9781787124769

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dipayan Dev

Dipayan Dev has completed his M.Tech from National Institute of Technology, Silchar with a first class first and is currently working as a software professional in Bengaluru, India. He has extensive knowledge and experience in non-relational database technologies, having primarily worked with large-scale data over the last few years. His core expertise lies in Hadoop Framework. During his postgraduation, Dipayan had built an infinite scalable framework for Hadoop, called Dr. Hadoop, which got published in top-tier SCI-E indexed journal of Springer (http://link.springer.com/article/10.1631/FITEE.1500015). Dr. Hadoop has recently been cited by Goo Wikipedia in their Apache Hadoop article. Apart from that, he registers interest in a wide range of distributed system technologies, such as Redis, Apache Spark, Elasticsearch, Hive, Pig, Riak, and other NoSQL databases. Dipayan has also authored various research papers and book chapters, which are published by IEEE and top-tier Springer Journals. To know more about him, you can also visit his LinkedIn profile https://www.linkedin.com/in/dipayandev.
Read more about Dipayan Dev

Other recommended products

Related to this chapter

Java Data Science Cookbook

Java has been one of the most popular languages for developers for several decades and yet the potential of the Java ecosystem still remains untapped when it comes to using JVM-based languages and platforms to solve data science related problems. A variety of tools and libraries are available such as Spark, Hadoop, and Mahout for computation and libraries such as MLlib, Weka, DL4j to implement smart data models. This book uncovers practically all these techniques in the form of recipes showing you how these tools and libraries can solve statistical, analytical, data mining, and information science related problems.

BookMar 2017372 pages

Recurrent Neural Networks with Python Quick Start Guide

Developers struggle to find an easy to follow learning resource for implementing Recurrent Neural Network(RNN) models. RNNs are the state-of-the-art model in deep learning for dealing with sequential data. From language translation to generating captions for an image, RNNs are used to continuously improve the results. This book will teach you the fundamentals of RNNs with example applications in Python and the TensorFlow library. The examples are accompanied by the right combination of theoretical knowledge and real-world implementations of concepts to build a solid foundation of neural network modeling.

BookNov 2018122 pages

Java Deep Learning Cookbook

Deep Learning is a trending topic in AI currently, as it allows you to make faster and more accurate predictions using the power of neural networks. This book will teach you the process of neural network design, and show you how to develop efficient deep learning applications using Deeplearning4j through practical and easy to implement recipes.

BookNov 2019304 pages

Hands-On Deep Learning with Apache Spark

Deep Learning is a subset of Machine Learning where data sets with several layers of complexity can be processed. This book teaches you the different techniques using which deep learning solutions can be implemented at scale, on Apache Spark. This will help you gain experience of implementing your deep learning models in many real-world use cases.

BookJan 2019322 pages

Python Deep Learning

Starting with a quick recap of important machine learning concepts, the book will delve straight into deep learning principles using Sci-kit learn. Moving ahead, you will learn to use the latest open source libraries such as Theano, Keras, Google's TensorFlow, and H20. Use this guide to uncover the difficulties of pattern recognition, scaling data with greater accuracy and discussing deep learning algorithms and techniques.

BookApr 2017406 pages

Neural Network Programming with Tensorflow

If you’re aware of the buzz surrounding the terms such as machine learning, artificial intelligence or deep learning, you might know what neural networks are. TensorFlow is a popular framework which can be used to implement efficient neural networks and deep learning models. This book will show you how to leverage the power of TensorFlow to train efficient neural networks. You will start with understanding the fundamentals and basic math for neural networks and why TensorFlow is a popular choice of tool for programming neural networks. During the course of the book, you will be working on real-world datasets to get a hands-on understanding of neural network programming. By the end of this book, you will have a fair understanding of how you can leverage the power of TensorFlow to train neural networks of varying complexities, without any hassle. While you are learning about various neural network implementations you will learn the underlying mathematics and linear algebra and how it maps to the appropriate TensorFlow constructs.

BookNov 2017274 pages

Hands-On Deep Learning Architectures with Python

This book explains the essential learning algorithms used for deep and shallow architectures. Packed with practical implementations to help you understand the concepts and ideas required to build efficient artificial intelligence systems, this book will help you construct deep models using popular frameworks and datasets.

BookApr 2019316 pages

R Deep Learning Cookbook

Deep Learning is the next big thing. It is a part of machine learning. Its favorable results in application with huge and complex data is remarkable. This book will help you to get through the problems that you face during the execution of different tasks and understand hacks in deep learning, neural networks, and advanced machine learning techniques

BookAug 2017288 pages

Deep Learning for Beginners

This book is for beginners who are looking for a strong foundation to build deep learning models from scratch. You will test your understanding of the concepts and measure your progress at the end of each chapter. You will have a firm understanding of deep learning and will be able to identify which algorithms are appropriate for different tasks.

BookSep 2020432 pages

Java Deep Learning Projects

You will build full-fledged, deep learning applications with Java and different open-source libraries. Master numerical computing, deep learning, and the latest Java programming features to carry out complex advanced tasks. This book is filled with best practices/tips after every project to help you optimize your deep learning models with ease.

BookJun 2018436 pages

Java for Data Science

Harness the incredible power of Java-based approaches to data science and create new, innovative applications to explore, visualise and analyse big data. With its tutorial approach and step-by-step instructional style, Java for Data Science is the ultimate data science book for Java developers interested in Java-based data science solutions.

BookJan 2017386 pages

Deep Learning with R Cookbook

This book will help you get through the problems that you face during the execution of different tasks and understand hacks in deep learning. With unique recipes, you will implement various deep learning architectures using R 3.5.x. You will cover complex algorithms to perform tasks such as reinforcement learning, GANs, advanced neural networks and more.

BookFeb 2020328 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages