Reader small image

You're reading from  Generative AI with LangChain

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781835083468
Edition1st Edition
Right arrow
Author (1)
Ben Auffarth
Ben Auffarth
author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth

Right arrow

What can AI do in other domains?

Generative AI models have demonstrated impressive capabilities across modalities including sound, music, video, and 3D shapes. In the audio domain, models can synthesize natural speech, generate original music compositions, and even mimic a speaker’s voice and the patterns of rhythm and sound (prosody). Speech-to-text systems can convert spoken language into text [Automatic Speech Recognition (ASR)]. For video, AI systems can create photorealistic footage from text prompts and perform sophisticated editing like object removal. 3D models learned to reconstruct scenes from images and generate intricate objects from textual descriptions.

The following table summarizes some recent models in these domains:

Model

Organization

Year

Domain

Architecture

Performance

3D-GQN

DeepMind

2018

3D

Deep, iterative, latent variable density models

3D scene generation from 2D images

Jukebox

OpenAI

2020

Music

VQ-VAE + transformer

High-fidelity music generation in different styles

Whisper

OpenAI

2022

Sound/speech

Transformer

Near human-level speech recognition

Imagen Video

Google

2022

Video

Frozen text transformers + video diffusion models

High-definition video generation from text

Phenaki

Google & UCL

2022

Video

Bidirectional masked transformer

Realistic video generation from text

TecoGAN

U. Munich

2022

Video

Temporal coherence module

High-quality, smooth video generation

DreamFusion

Google

2022

3D

NeRF + Diffusion

High-fidelity 3D object generation from text

AudioLM

Google

2023

Sound/speech

Tokenizer + transformer LM + detokenizer

High linguistic quality speech generation maintaining speaker’s identity

AudioGen

Meta AI

2023

Sound/speech

Transformer + text guidance

High-quality conditional and unconditional audio generation

Universal Speech Model (USM)

Google

2023

Sound/speech

Encoder-decoder transformer

State-of-the-art multilingual speech recognition

Table 1.1: Models for audio, video, and 3D domains

Underlying many of these innovations are advances in deep generative architectures like GANs, diffusion models, and transformers. Leading AI labs at Google, OpenAI, Meta, and DeepMind are pushing the boundaries of what’s possible.

Previous PageNext Page
You have been reading a chapter from
Generative AI with LangChain
Published in: Dec 2023Publisher: PacktISBN-13: 9781835083468
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Ben Auffarth

Ben Auffarth is a full-stack data scientist with more than 15 years of work experience. With a background and Ph.D. in computational and cognitive neuroscience, he has designed and conducted wet lab experiments on cell cultures, analyzed experiments with terabytes of data, run brain models on IBM supercomputers with up to 64k cores, built production systems processing hundreds and thousands of transactions per day, and trained language models on a large corpus of text documents. He co-founded and is the former president of Data Science Speakers, London.
Read more about Ben Auffarth