Reader small image

You're reading from  Active Machine Learning with Python

Product typeBook
Published inMar 2024
PublisherPackt
ISBN-139781835464946
Edition1st Edition
Right arrow
Author (1)
Margaux Masson-Forsythe
Margaux Masson-Forsythe
author image
Margaux Masson-Forsythe

Margaux Masson-Forsythe is a skilled machine learning engineer and advocate for advancements in surgical data science and climate AI. As the Director of Machine Learning at Surgical Data Science Collective, she builds computer vision models to detect surgical tools in videos and track procedural motions. Masson-Forsythe manages a multidisciplinary team and oversees model implementation, data pipelines, infrastructure, and product delivery. With a background in computer science and expertise in machine learning, computer vision, and geospatial analytics, she has worked on projects related to reforestation, deforestation monitoring, and crop yield prediction.
Read more about Margaux Masson-Forsythe

Right arrow

Leveraging Active Learning for Big Data

In this chapter, we will explore how to use machine learning (ML) to deal with big data, such as videos. The task of developing ML models for video analysis comes with its own set of unique challenges. Videos, being inherently large, pose significant hurdles in terms of efficient processing. Video analysis using ML has become an increasingly important technique across many industries and applications. From autonomous vehicles that rely on computer vision models to analyze road conditions in real-time video feeds, to security systems that can automatically detect suspicious activity, ML is revolutionizing what’s possible with video data. These models can automate time-consuming manual analysis and provide scalable video understanding. Implementing performant and scalable video analysis pipelines involves surmounting key hurdles such as an enormous amount of data labeling.

We will guide you through a cutting-edge ML method that will aid...

Technical requirements

In this chapter, you will need to install the following packages:

pip install ultralytics lightly docker encord

You will also need the following imports:

import os
from IPython.display import display, Markdown
from ultralytics import YOLO
from pathlib import Path
import json
import contextlib
from typing import Iterator
import docker
from docker.models.containers import Container
from lightly.api import ApiWorkflowClient
from lightly.openapi_generated.swagger_client import DatasetType
from lightly.openapi_generated.swagger_client import DatasourcePurpose
from encord.orm.cloud_integration import CloudIntegration
from encord.orm.dataset import AddPrivateDataResponse
from encord.user_client import EncordUserClient
from encord.orm.dataset import CreateDatasetResponse, StorageLocation

Next, you need to create a Lightly account and set up your API token, as follows:

lightly_token = "your_lightly_token"

Then, you must set up the Lightly client...

Implementing ML models for video analysis

Active ML plays a transformative role in managing big data projects by strategically optimizing the data annotation process, thereby enhancing model performance with less manual effort. For instance, in large-scale image recognition tasks, such as identifying specific objects across millions of social media photos, active learning can significantly reduce the workload by pinpointing images that are most likely to refine the model’s capabilities. Similarly, in natural language processing (NLP) applications, dealing with vast amounts of text data from sources such as news articles, forums, and customer feedback, active ML helps in selectively annotating documents that add the most value to understanding complex language nuances or sentiments. This approach not only streamlines the effort required in annotating massive datasets but also ensures that models trained on such data are more accurate, efficient, and capable of handling the real...

Selecting the most informative frames with Lightly

In this section, we will use an active ML tool called Lightly. Lightly is a data curation tool that’s equipped with a web platform that enables users to choose the optimal subset of samples for maximizing model accuracy. Lightly’s algorithms can process substantial volumes of data, such as 10 million images or 10 thousand videos, in less than 24 hours.

The web app allows users to explore their datasets using filters such as sharpness, luminance, contrast, file size, and more. They can then use these filters to explore correlations between these characteristics.

Users can also search for similar images or objects within the app and look into the embeddings (principal component analysis (PCA), T-distributed stochastic neighbor embedding (TSNE), and uniform manifold approximation and projection (UMAP)). Embeddings refers to vector representations of images that are learned by deep neural networks. They capture visual...

Summary

In this chapter, we learned how to use Lightly to efficiently select the most informative frames in videos to improve object detection models using diverse sampling strategies. We also saw how to send these selected frames to the labeling platform Encord, thereby completing an end-to-end use case. Finally, we explored how to further enhance sampling by incorporating an SSL step into the active ML pipeline.

Moving forward, our focus will shift to exploring how to effectively evaluate, monitor, and test the active ML pipeline. This step is essential in ensuring that the pipeline remains robust and reliable throughout its deployment. By implementing comprehensive evaluation strategies, we can assess the performance of the pipeline against predefined metrics and benchmarks. Additionally, continuous monitoring will allow us to identify any potential issues or deviations from expected behavior, enabling us to take proactive measures to maintain optimal performance.

Furthermore...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Active Machine Learning with Python
Published in: Mar 2024Publisher: PacktISBN-13: 9781835464946
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Margaux Masson-Forsythe

Margaux Masson-Forsythe is a skilled machine learning engineer and advocate for advancements in surgical data science and climate AI. As the Director of Machine Learning at Surgical Data Science Collective, she builds computer vision models to detect surgical tools in videos and track procedural motions. Masson-Forsythe manages a multidisciplinary team and oversees model implementation, data pipelines, infrastructure, and product delivery. With a background in computer science and expertise in machine learning, computer vision, and geospatial analytics, she has worked on projects related to reforestation, deforestation monitoring, and crop yield prediction.
Read more about Margaux Masson-Forsythe