Reader small image

You're reading from  Transformers for Natural Language Processing and Computer Vision - Third Edition

Product typeBook
Published inFeb 2024
Reading LevelN/a
PublisherPackt
ISBN-139781805128724
Edition3rd Edition
Languages
Tools
Right arrow
Author (1)
Denis Rothman
Denis Rothman
author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Right arrow

On the Road to Functional AGI with HuggingGPT and its Peers

Functional Artificial General Intelligence (F-AGI) is the automation of a sufficient scope of tasks in a closed environment that can outperform a human in a workplace. We use the term F-AGI to describe an AI system’s capabilities and the risks related to job losses. Beyond the risks, another factor has become critical: volume. Human activity requires millions of micro-decisions to produce services, products, and communication resources. F-AGI has begun to emerge out of necessity.

The term F-AGI avoids the confusion with the AGI myth of a sentient, conscious, humanoid AI agent. F-AGI restricts its scope to a closed domain in a workplace. An F-AGI can work 24/7. It can solve millions of tasks per day. It will prove invaluable in critical situations such as wildfire prevention, flight planning to reduce carbon emissions, and a limitless variety of productive activities.

Microsoft, Google, OpenAI, Hugging Face...

Defining F-AGI

F-AGI does not have a consciousness at all. Nor can F-AGI do everything, such as the sum of all human intellectual activity.

The scope of this section and chapter is certainly not to find ways to replace humans and create job displacement and replacement. This chapter also stays far away from harmful content.

F-AGI can truly become human-level entities when needed in many domains, such as:

  • Replicating human vision fire detectors in areas lacking human presence to detect wildfires as soon as possible. California has begun to install fire detection cameras all over the state. The service provides human-level services that it would not be possible to implement 24/7. Pano, for example, offers AI-driven vision services for wildfire management (see the References section).
  • Replicating human vision skin cancer detection when human specialists are not always available. IBM initiated a skin cancer project in Australia several years ago. Australia has...

Installing and importing

In this chapter, we will leverage the no-coding functionality of advanced AI platforms. Open Computer_Vision_Analysis.ipynb, the chapter notebook, which contains the results of interactions with Google Cloud Vision, HuggingGPT, Midjourney, and Runway Gen-2.

The notebook installs moviepy to display Generative AI videos:

!pip install moviepy -qq

IPython.display is installed to display the Generative AI images:

from IPython.display import Image     #This is used for rendering images in the notebook

json is imported to parse the output of vision models and pandas to display the JSON content:

import json
import pandas as pd

We will now download the validation set for our assisted-driving research.

Validation set

The images in the validation set were created by Stable Diffusion and Midjourney and were submitted to several vision models in Chapter 18, Hugging Face AutoTrain: Training Vision Models without Coding. In a real-life project, such a dataset must contain many images of many types. This chapter limits the scope to a few images to explore methods to solve a complex problem with a unit test.

Level 1 image: easy

The first image is the level 1, easy image:

!curl -L https://raw.githubusercontent.com/Denis2054/Transformers_3rd_Edition/master/Chapter16/generate_an_image_of_a_car_in_space.jpg --output "generate_an_image_of_a_car_in_space.jpg"

Let’s display it to have it in mind for the rest of the chapter:

from PIL import Image
# Define the path of your image
image_path = "/content/generate_an_image_of_a_car_in_space.jpg"
# Open the image
image = Image.open(image_path)
image

The output shows a white car on a black background...

HuggingGPT

The validation process for HuggingGPT will run on Hugging Face’s platform:

https://huggingface.co/spaces/microsoft/HuggingGPT.

If the server is down, or there’s an issue with HuggingGPT, as sometimes happens, you can try to clone the repository, run it with Docker, or contact Hugging Face support. Hugging Face forums are there to help as well.

Hugging Face, like all the cutting-edge AI platforms, is evolving at full speed. We must remain constructive.

Shen et al. (2023) designed a method for an LLM to connect AI models. The title and Figure 1 of their paper sums the process up:

  • HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
  • “Figure 1: Language serves as an interface for LLMs (e.g., ChatGPT) to connect numerous AI models (e.g., those in Hugging Face) for solving complicated AI tasks.”

In this section, we will ask HuggingGPT to detect a car in an image in our assisted...

CustomGPT

We tried to obtain a result with our validation dataset through HuggingGPT. We would have replaced the very difficult image HuggingGPT failed to analyze correctly. In that case, we would have a shiny, optimistic notebook showing that AI is fantastic and easy to implement with automated near F-AGI tools.

Unfortunately, even an innovative HuggingGPT system cannot solve all AI problems.

The HuggingGPT system does not contain all the available models for all NLP and computer vision tasks although it’s a good initiative.

AI Experts will necessarily have to confirm whether a specific AI problem cannot be solved or not. In this section, we show that, in some cases, creative solutions can work.

We will build on the concept of a cross-platform multi-model system such as HuggingGPT. In this case, we create a custom pipeline to solve a specific problem. We will implement CustomGPT, a cross-platform chained-model solution shown in Figure 19.7:

A diagram of a process  Description automatically generated

Figure...

Model Chaining with Runway Gen-2

In this section, we will chain the output of a Midjourney image to the input of the Gen-2 image to video service.

First, let’s download an example Gen-2 video produced by a prompt asking the model to create a video of an astronaut driving in a rover on Mars:

!curl -L https://raw.githubusercontent.com/Denis2054/Transformers_3rd_Edition/master/Chapter19/Gen-2_Mars.mp4 --output "Gen-2_Mars.mp4"

We can now view it in the chapter notebook:

from moviepy.editor import *
# Load myHolidays.mp4 and select the subclip 00:00:00 - 00:00:60
clip = VideoFileClip("Gen-2_Mars.mp4").subclip(00,7)
clip = clip.loop(5)
clip.ipython_display(width=900)

The video you can view in the notebook is quite promising.

A person in a helmet driving a car  Description automatically generated

Figure 19.14: A screenshot of an AI-generated video about an astronaut on Mars

Now, let’s chain Midjourney to Gen-2.

Midjourney: Imagine a ship in the galaxy

Let’s download an image...

Summary

In this chapter, HuggingGPT concepts led to solving the classification problem of cars in the fog and night that arose in Chapter 18, Hugging Face AutoTrain: Training Vision Models without Coding. HuggingGPT’s innovative approach uses ChatGPT as a controller, managing the comprehensive library of Hugging Face models.

We first defined F-AGI as the ability to attain human-level functionality for a real-life task in a closed ecosystem. For example, a computer vision AI agent can replace a human fire-alert watcher 24/7 over vast territories. The chapter addressed practical computer vision abilities that enhance human activity without threatening jobs or getting involved in politics.

Then, the chapter notebook downloaded a validation set containing an easy, difficult, and very difficult image containing a car. The goal was to find a way to identify a vehicle in a very difficult image simulating a video frame of an assisted driving AI agent.

We ran HuggingGPT...

Questions

  1. AGI already exists and is spreading everywhere. (True/False)
  2. Functional AGI is conscious. (True/False)
  3. Functional AGI can perform human-level tasks in a closed environment. (True/False)
  4. Vision models can now identify all objects in all situations. (True/False)
  5. HuggingGPT leverages the abilities of an LLM such as ChatGPT. (True/False)
  6. ChatGPT can be a controller in the HuggingGPT ecosystem. (True/False)
  7. HuggingGPT is not a cross-platform system. (True/False)
  8. Chained models can improve overall vision model performances. (True/False)
  9. Google Cloud Vision and ChatGPT cannot be chained. (True/False)
  10. Midjourney images can become the input of Gen-2 to produce videos. (True/False)

Further Reading

  • Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang, 2023, HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face: https://arxiv.org/abs/2303.17580

Join our community on Discord

Join our community’s Discord space for discussions with the authors and other readers:

https://www.packt.link/Transformers

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Transformers for Natural Language Processing and Computer Vision - Third Edition
Published in: Feb 2024Publisher: PacktISBN-13: 9781805128724
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Denis Rothman

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman