You're reading from The Definitive Guide to Google Vertex AI

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781801815260

Edition1st Edition

Concepts

Data Science

Authors (2):

Jasmeet Bhatia

Kartik Chaudhary

View More author details

Natural Language Models – Detecting Fake News Articles!

A significant amount of content on the internet is in textual format. Almost every organization stores lots of internal data and resources as text documents. Natural language processing (NLP) is a subfield of machine learning that’s concerned with organizing, understanding, and making decisions based on textual input data. Over the past decade, NLP has become the utmost important aspect of transforming business processes and making informed decisions. For example, a sentiment analysis model can help a business understand the high-level sentiments of their customers toward their products and services. A topic modeling algorithm combined with sentiment analysis can figure out the key pain points of the customers and thus it can inform the business decisions to make customer satisfaction a priority.

In this chapter, we will develop an ML system that can recognize fake news articles. Such systems can help in keeping...

Technical requirements

The code samples used in this chapter can be found in this book’s GitHub repository: https://github.com/PacktPublishing/The-Definitive-Guide-to-Google-Vertex-AI/tree/main/Chapter17.

Detecting fake news using NLP

Nowadays, due to the increase in the use of the internet, it has become really easy to spread fake news. A large number of users are consuming and posting content on the internet via their social media accounts daily. It has become difficult to distinguish the real news from the fake news. Fake news, however, can do significant damage to a person, society, organization, or political party. Looking at the scale, it is impossible to skim through every article manually or using a human reviewer. Thus, there is a need to develop smart algorithms that can automatically detect fake news articles and stop the spread of dangerous news as soon as it is generated.

ML-based classification algorithms can be used to detect fake news. First, we need a good training dataset to train the classification model on so that it can learn the common patterns of fake news and thus automatically distinguish it from real news. In this section, we will train an ML model to classify...

Launching model training on Vertex AI

In this section, we will launch our training experiment as a Vertex AI training job. There are multiple advantages of launching training jobs on Vertex AI instead of doing it in a Juypter Notebook:

The flexibility to launch any number of parallel experiments
We can choose the best hardware for model training, which is very important when accelerators are needed to train deep learning models.
We don’t need active monitoring regarding training progress
There’s no fear of the Jupyter Notebook crashing
Vertex AI training jobs can be configured to log metadata and experiments in the Google Cloud Console UI
In this section, we will create and launch a Vertex AI training job for our experiment. There are two main things we need to do to launch a Vertex AI training job. First, we need to put the dataset in a location that will be accessible to the Vertex AI job (such as GCS or BigQuery). Second, we need to put...

BERT-based fake news classification

In our first experiment, we trained a classical random forest classifier on TF-IDF features to detect fake versus real news articles and got an accuracy score of about 93%. In this section, we will train a deep learning model for the same task and see if we get any accuracy gains over the classical tree-based approach. Deep learning has changed the way we used to solve NLP problems. Classical approaches required hand-crafted features, most of which were related to the frequency of words appearing in a document. Looking at the complexity of languages, just knowing the count of words in a paragraph is not enough. The order in which words occur also has a significant impact on the overall meaning of the paragraph or sentence. Deep learning approaches such as Long-Short-Term-Memory (LSTM) also consider the sequential dependency of words in sentences or paragraphs to get a more meaningful feature representation. LSTM has achieved great success in many...

Summary

This chapter was about a real-world NLP use case for detecting fake news. In the current era of the internet, spreading fake news has become quite easy and it can be dangerous for the reputation of a person, society, organization, or political party. As we have seen in our experiments, ML classification can be used as a powerful tool for detecting fake news articles. Deep learning-based approaches can further improve the results of text classification use cases without requiring much fine-tuning data.

After reading this chapter, you should be confident about training and applying classification models on text classification use cases, similar to fake news detection. You should also have a good understanding of the cleaning and pre-processing steps that are needed to apply classical models, such as random forest, on text data. At this point, you should be able to launch large-scale ML experiments as Vertex AI training jobs. Finally, you should have a good understanding of...

The rest of the chapter is locked

You have been reading a chapter from

The Definitive Guide to Google Vertex AI

Published in: Dec 2023Publisher: PacktISBN-13: 9781801815260

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Jasmeet Bhatia

Jasmeet is a Machine Learning Architect with over 8 years of experience in Data Science and Machine Learning Engineering at Google and Microsoft, and overall has 17 years of experience in Product Engineering and Technology consulting at Deloitte, Disney, and Motorola. He has been involved in building technology solutions that focus on solving complex business problems by utilizing information and data assets. He has built high performing engineering teams, designed and built global scale AI/Machine Learning, Data Science, and Advanced analytics solutions for image recognition, natural language processing, sentiment analysis, and personalization.
Read more about Jasmeet Bhatia

Kartik Chaudhary

Kartik is an Artificial Intelligence and Machine Learning professional with 6+ years of industry experience in developing and architecting large scale AI/ML solutions using the technological advancements in the field of Machine Learning, Deep Learning, Computer Vision and Natural Language Processing. Kartik has filed 9 patents at the intersection of Machine Learning, Healthcare, and Operations. Kartik loves sharing knowledge, blogging, travel, and photography.
Read more about Kartik Chaudhary

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages