You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

Product typeBook

Published inFeb 2024

Reading LevelN/a

PublisherPackt

ISBN-139781805128724

Edition3rd Edition

Languages

Python

Tools

PyTorch

Concepts

Deep Learning

Author (1)

Denis Rothman

On the Road to Functional AGI with HuggingGPT and its Peers

Functional Artificial General Intelligence (F-AGI) is the automation of a sufficient scope of tasks in a closed environment that can outperform a human in a workplace. We use the term F-AGI to describe an AI system’s capabilities and the risks related to job losses. Beyond the risks, another factor has become critical: volume. Human activity requires millions of micro-decisions to produce services, products, and communication resources. F-AGI has begun to emerge out of necessity.

The term F-AGI avoids the confusion with the AGI myth of a sentient, conscious, humanoid AI agent. F-AGI restricts its scope to a closed domain in a workplace. An F-AGI can work 24/7. It can solve millions of tasks per day. It will prove invaluable in critical situations such as wildfire prevention, flight planning to reduce carbon emissions, and a limitless variety of productive activities.

Microsoft, Google, OpenAI, Hugging Face...

Defining F-AGI

F-AGI does not have a consciousness at all. Nor can F-AGI do everything, such as the sum of all human intellectual activity.

The scope of this section and chapter is certainly not to find ways to replace humans and create job displacement and replacement. This chapter also stays far away from harmful content.

F-AGI can truly become human-level entities when needed in many domains, such as:

Replicating human vision fire detectors in areas lacking human presence to detect wildfires as soon as possible. California has begun to install fire detection cameras all over the state. The service provides human-level services that it would not be possible to implement 24/7. Pano, for example, offers AI-driven vision services for wildfire management (see the References section).
Replicating human vision skin cancer detection when human specialists are not always available. IBM initiated a skin cancer project in Australia several years ago. Australia has...

Installing and importing

In this chapter, we will leverage the no-coding functionality of advanced AI platforms. Open Computer_Vision_Analysis.ipynb, the chapter notebook, which contains the results of interactions with Google Cloud Vision, HuggingGPT, Midjourney, and Runway Gen-2.

The notebook installs moviepy to display Generative AI videos:

!pip install moviepy -qq

IPython.display is installed to display the Generative AI images:

from IPython.display import Image     #This is used for rendering images in the notebook

json is imported to parse the output of vision models and pandas to display the JSON content:

import json
import pandas as pd

We will now download the validation set for our assisted-driving research.

Validation set

The images in the validation set were created by Stable Diffusion and Midjourney and were submitted to several vision models in Chapter 18, Hugging Face AutoTrain: Training Vision Models without Coding. In a real-life project, such a dataset must contain many images of many types. This chapter limits the scope to a few images to explore methods to solve a complex problem with a unit test.

Level 1 image: easy

The first image is the level 1, easy image:

!curl -L https://raw.githubusercontent.com/Denis2054/Transformers_3rd_Edition/master/Chapter16/generate_an_image_of_a_car_in_space.jpg --output "generate_an_image_of_a_car_in_space.jpg"

Let’s display it to have it in mind for the rest of the chapter:

from PIL import Image
# Define the path of your image
image_path = "/content/generate_an_image_of_a_car_in_space.jpg"
# Open the image
image = Image.open(image_path)
image

The output shows a white car on a black background...

HuggingGPT

The validation process for HuggingGPT will run on Hugging Face’s platform:

https://huggingface.co/spaces/microsoft/HuggingGPT.

If the server is down, or there’s an issue with HuggingGPT, as sometimes happens, you can try to clone the repository, run it with Docker, or contact Hugging Face support. Hugging Face forums are there to help as well.

Hugging Face, like all the cutting-edge AI platforms, is evolving at full speed. We must remain constructive.

Shen et al. (2023) designed a method for an LLM to connect AI models. The title and Figure 1 of their paper sums the process up:

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
“Figure 1: Language serves as an interface for LLMs (e.g., ChatGPT) to connect numerous AI models (e.g., those in Hugging Face) for solving complicated AI tasks.”

In this section, we will ask HuggingGPT to detect a car in an image in our assisted...

CustomGPT

We tried to obtain a result with our validation dataset through HuggingGPT. We would have replaced the very difficult image HuggingGPT failed to analyze correctly. In that case, we would have a shiny, optimistic notebook showing that AI is fantastic and easy to implement with automated near F-AGI tools.

Unfortunately, even an innovative HuggingGPT system cannot solve all AI problems.

The HuggingGPT system does not contain all the available models for all NLP and computer vision tasks although it’s a good initiative.

AI Experts will necessarily have to confirm whether a specific AI problem cannot be solved or not. In this section, we show that, in some cases, creative solutions can work.

We will build on the concept of a cross-platform multi-model system such as HuggingGPT. In this case, we create a custom pipeline to solve a specific problem. We will implement CustomGPT, a cross-platform chained-model solution shown in Figure 19.7:

A diagram of a process Description automatically generated

Figure...

Model Chaining with Runway Gen-2

In this section, we will chain the output of a Midjourney image to the input of the Gen-2 image to video service.

First, let’s download an example Gen-2 video produced by a prompt asking the model to create a video of an astronaut driving in a rover on Mars:

!curl -L https://raw.githubusercontent.com/Denis2054/Transformers_3rd_Edition/master/Chapter19/Gen-2_Mars.mp4 --output "Gen-2_Mars.mp4"

We can now view it in the chapter notebook:

from moviepy.editor import *
# Load myHolidays.mp4 and select the subclip 00:00:00 - 00:00:60
clip = VideoFileClip("Gen-2_Mars.mp4").subclip(00,7)
clip = clip.loop(5)
clip.ipython_display(width=900)

The video you can view in the notebook is quite promising.

A person in a helmet driving a car Description automatically generated

Figure 19.14: A screenshot of an AI-generated video about an astronaut on Mars

Now, let’s chain Midjourney to Gen-2.

Midjourney: Imagine a ship in the galaxy

Let’s download an image...

Summary

In this chapter, HuggingGPT concepts led to solving the classification problem of cars in the fog and night that arose in Chapter 18, Hugging Face AutoTrain: Training Vision Models without Coding. HuggingGPT’s innovative approach uses ChatGPT as a controller, managing the comprehensive library of Hugging Face models.

We first defined F-AGI as the ability to attain human-level functionality for a real-life task in a closed ecosystem. For example, a computer vision AI agent can replace a human fire-alert watcher 24/7 over vast territories. The chapter addressed practical computer vision abilities that enhance human activity without threatening jobs or getting involved in politics.

Then, the chapter notebook downloaded a validation set containing an easy, difficult, and very difficult image containing a car. The goal was to find a way to identify a vehicle in a very difficult image simulating a video frame of an assisted driving AI agent.

We ran HuggingGPT...

Questions

AGI already exists and is spreading everywhere. (True/False)
Functional AGI is conscious. (True/False)
Functional AGI can perform human-level tasks in a closed environment. (True/False)
Vision models can now identify all objects in all situations. (True/False)
HuggingGPT leverages the abilities of an LLM such as ChatGPT. (True/False)
ChatGPT can be a controller in the HuggingGPT ecosystem. (True/False)
HuggingGPT is not a cross-platform system. (True/False)
Chained models can improve overall vision model performances. (True/False)
Google Cloud Vision and ChatGPT cannot be chained. (True/False)
Midjourney images can become the input of Gen-2 to produce videos. (True/False)

References

Pano: https://www.pano.ai/
Hugging Face HuggingGPT: https://huggingface.co/spaces/microsoft/HuggingGPT
Google Cloud Vision: https://cloud.google.com/vision/docs/drag-and-drop
Microsoft Bing: https://www.bing.com/new
Midjourney: https://www.midjourney.com/
Runway Gen-2: https://research.runwayml.com/gen2

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

On the Road to Functional AGI with HuggingGPT and its Peers

Defining F-AGI

Installing and importing

Validation set

Level 1 image: easy

HuggingGPT

CustomGPT

Model Chaining with Runway Gen-2

Midjourney: Imagine a ship in the galaxy

Summary

Questions

References

Further Reading

Join our community on Discord

Author (1)

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook

You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

On the Road to Functional AGI with HuggingGPT and its Peers

Defining F-AGI

Installing and importing

Validation set

Level 1 image: easy

HuggingGPT

CustomGPT

Model Chaining with Runway Gen-2

Midjourney: Imagine a ship in the galaxy

Summary

Questions

References

Further Reading

Join our community on Discord

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook