You're reading from Active Machine Learning with Python

Product typeBook

Published inMar 2024

PublisherPackt

ISBN-139781835464946

Edition1st Edition

Concepts

Machine Learning

Author (1)

Margaux Masson-Forsythe

Managing the Human in the Loop

Active ML promises more efficient ML by intelligently selecting the most informative samples for labeling by human oracles. However, the success of these human-in-the-loop systems depends on effective interface design and workflow management. In this chapter, we will cover best practices for optimizing the human role in active ML. First, we will explore interactive system design, discussing how to create labeling interfaces that enable efficient and accurate annotations. Next, we will provide an extensive overview of the leading human-in-the-loop frameworks for managing the labeling pipeline. We will then turn to handling model-label disagreements through adjudication and quality control. After that, we will discuss strategies for recruiting qualified labelers and managing them effectively. Finally, we will examine techniques for evaluating and ensuring high-quality annotations and properly balanced datasets. By the end of this chapter, you will have...

Technical requirements

In this chapter, we will be using the huggingface package, so you’ll need to install it, as follows:

pip install datasets transformers huggingface_hub && apt-get install git-lfs

Plus, you will need the following imports:

from transformers import pipeline
import torch
from datasets import load_dataset
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score

Designing interactive learning systems and workflows

The effectiveness of a human-in-the-loop system depends heavily on how well the labeling interface and workflow are designed. Even with advanced active ML algorithms selecting the most useful data points, poor interface design can cripple the labeling process. Without intuitive controls, informative queries, and efficient workflows adapted to humans, annotation quality and speed will suffer.

In this section, we will cover best practices for optimizing the human experience when interacting with active ML systems. Following these guidelines will enable you to create intuitive labeling pipelines, minimize ambiguity, and streamline the labeling process as much as possible. We will also discuss strategies for integrating active ML queries, collecting labeler feedback, and combining expert and crowd labelers. By focusing on human-centered design, you can develop interactive systems that maximize the utility of human input for your models...

Exploring human-in-the-loop labeling tools

Human-in-the-loop labeling frameworks are critical for enabling effective collaboration between humans and ML systems. In this section, we will explore some of the leading human-in-the-loop labeling tools for active ML.

We will look at how these frameworks allow humans to provide annotations, verify predictions, adjust model confidence thresholds, and guide model training through interfaces and workflows optimized for human-AI collaboration. Key capabilities provided by human-in-the-loop frameworks include annotation-assisted active ML, human verification of predictions, confidence calibration, and model interpretability.

The labeling tools we will examine include Snorkel AI, Prodigy, Encord, Roboflow, and others. We will walk through examples of how these tools can be leveraged to build applied active learning systems with effective human guidance. The strengths and weaknesses of different approaches will be discussed. By the end of...

Handling model-label disagreements

Disagreements between model predictions and human labels are inevitable. In this section, we will study how to identify and resolve conflicts.

Programmatically identifying mismatches

To identify discrepancies between the model’s predictions and the human-annotated labels, we can write some simple Python code that highlights the mismatches for review.

Let’s consider the example of an NLP sentiment classifier. This type of classifier is designed to analyze and understand the sentiment or emotions expressed in text. By examining the words, phrases, and context used in a given piece of text, an NLP sentiment classifier can determine whether the sentiment is positive, negative, or neutral. First, we will use the sentiment-analysis model from Huggingface:

sentiment_pipeline = pipeline("sentiment-analysis")
data = ["I love you", "I hate you"]
sentiment_pipeline(data)

The returns the following output...

Effectively managing human-in-the-loop systems

Getting high-quality annotations requires finding, vetting, supporting, and retaining effective labelers. It is crucial to build an appropriate labeling team that meets the requirements of the ML project.

The first option is to establish an internal labeling team. This involves hiring full-time employees to label data, which enables close management and training. Cultivating domain expertise is easier when done internally. However, there are drawbacks to this, such as higher costs and turnover. This option is only suitable for large, ongoing labeling requirements.

Another option is to crowdsource labeling tasks using platforms such as ScaleAI, which allow labeling tasks to be distributed to a large, on-demand workforce. This option provides flexibility and lower costs, but it can lack domain expertise. Quality control becomes challenging when working with anonymous crowd workers.

You could use third-party labeling services, such...

Ensuring annotation quality and dataset balance

Maintaining high annotation quality and target class balance requires diligent management. In this section, we’ll look at some techniques that can help assure labeling quality.

Assess annotator skills

It is highly recommended that annotators undergo thorough training sessions and complete qualification tests before they can work independently. This ensures that they have a solid foundation of knowledge and understanding in their respective tasks. These performance metrics can be visualized in the labeling platform when the reviewers accept or reject annotations. If a labeler has many rejected annotations, it is necessary to ensure that they understand the task and assess what help can be provided to them.

It is advisable to periodically assess the labeler’s skills by providing control samples for evaluation purposes. This ongoing evaluation helps maintain the quality and consistency of their work over time.

For...

Summary

This chapter explored strategies for effectively incorporating human input into active ML systems. We discussed how to design workflows that enable efficient collaboration between humans and AI models. Leading open source frameworks for human-in-the-loop learning were reviewed, including their capabilities for annotation, verification, and active learning.

Handling model-label disagreements is a key challenge in human-AI systems. Techniques such as manually reviewing conflicts and active learning cycles help identify and resolve mismatches. Carefully managing the human annotation workforce is also critical as it covers recruiters, training, quality control, and tooling.

A major focus was ensuring high-quality balanced datasets while using methods such as qualification exams, inter-annotator metrics such as the accuracy or the Kappa score, consensus evaluations, and targeted sampling. By implementing robust processes around collaboration, conflict resolution, annotator...

The rest of the chapter is locked

You have been reading a chapter from

Active Machine Learning with Python

Published in: Mar 2024Publisher: PacktISBN-13: 9781835464946

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Margaux Masson-Forsythe

Margaux Masson-Forsythe is a skilled machine learning engineer and advocate for advancements in surgical data science and climate AI. As the Director of Machine Learning at Surgical Data Science Collective, she builds computer vision models to detect surgical tools in videos and track procedural motions. Masson-Forsythe manages a multidisciplinary team and oversees model implementation, data pipelines, infrastructure, and product delivery. With a background in computer science and expertise in machine learning, computer vision, and geospatial analytics, she has worked on projects related to reforestation, deforestation monitoring, and crop yield prediction.
Read more about Margaux Masson-Forsythe

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages