Reader small image

You're reading from  Deep Learning for Computer Vision

Product typeBook
Published inJan 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781788295628
Edition1st Edition
Languages
Right arrow
Author (1)
Rajalingappaa Shanmugamani
Rajalingappaa Shanmugamani
author image
Rajalingappaa Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.
Read more about Rajalingappaa Shanmugamani

Right arrow

Chapter 9. Video Classification

In this chapter, we will see how to train deep learning models for video data. We will start classifying videos on a frame basis. Then, we will use the temporal information for better accuracy. Later, we will extend the applications of images to videos, including pose estimation, captioning, and generating videos.

We will cover the following topics in this chapter:

  • The datasets and the algorithms of video classification
  • Splitting a video into frames and classifying videos
  • Training a model for visual features on an individual frame level 0
  • Understanding 3D convolution and its use in videos
  • Incorporating motion vectors on video
  • Object tracking utilizing the temporal information
  • Applications such as human pose estimation and video captioning

Understanding and classifying videos 


A video is nothing but a series of images. Video brings a new dimension to the image along the temporal direction. The spatial features of the images and temporal features of the video can be put together, providing a better outcome than just the image. The extra dimension also results in a lot of space and hence increases the complexity of training and inference. The computational demands are extremely high for processing a video. Video also changes the architecture of deep learning models as we have to consider the temporal features. 

Video classification is the task of labeling a video with a category. A category can be on the frame level or for the whole video. There could be actions or tasks performed in the video. Hence, a video classification may label the objects present in the video or label the actions happening in the video. In the next section, we will see the available datasets for video classification tasks. 

Exploring video classification...

Extending image-based approaches to videos


Images can be used for pose estimation, style transfer, image generation, segmentation, captioning, and so on. Similarly, these applications find a place in videos too. Using the temporal information may improve the predictions from images and vice versa. In this section, we will see how to extend these applications to videos.

Regressing the human pose

Human pose estimation is an important application of video data and can improve other tasks such as action recognition. First, let's see a description of the datasets available for pose estimation:

Pfister et al. (https://www.cv-foundation.org/openaccess/content_iccv_2015...

Summary


In this chapter, we covered various topics related to video classification. We saw how to split videos into frames and use the deep learning models that are in images for various tasks. We covered a few algorithms that are specific to video, such as tracking objects. We saw how to apply video-based solutions to various scenarios such as action recognition, gesture recognition, security applications, and intrusion detection.

In the next chapter, we will learn how to deploy the trained models from the previous chapter into production on various cloud and mobile platforms. We will see how different hardware affects the performance regarding latency and throughput.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Deep Learning for Computer Vision
Published in: Jan 2018Publisher: PacktISBN-13: 9781788295628
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rajalingappaa Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.
Read more about Rajalingappaa Shanmugamani