Reader small image

You're reading from  The Definitive Guide to Google Vertex AI

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781801815260
Edition1st Edition
Concepts
Right arrow
Authors (2):
Jasmeet Bhatia
Jasmeet Bhatia
author image
Jasmeet Bhatia

Jasmeet is a Machine Learning Architect with over 8 years of experience in Data Science and Machine Learning Engineering at Google and Microsoft, and overall has 17 years of experience in Product Engineering and Technology consulting at Deloitte, Disney, and Motorola. He has been involved in building technology solutions that focus on solving complex business problems by utilizing information and data assets. He has built high performing engineering teams, designed and built global scale AI/Machine Learning, Data Science, and Advanced analytics solutions for image recognition, natural language processing, sentiment analysis, and personalization.
Read more about Jasmeet Bhatia

Kartik Chaudhary
Kartik Chaudhary
author image
Kartik Chaudhary

​Kartik is an Artificial Intelligence and Machine Learning professional with 6+ years of industry experience in developing and architecting large scale AI/ML solutions using the technological advancements in the field of Machine Learning, Deep Learning, Computer Vision and Natural Language Processing. Kartik has filed 9 patents at the intersection of Machine Learning, Healthcare, and Operations. Kartik loves sharing knowledge, blogging, travel, and photography.
Read more about Kartik Chaudhary

View More author details
Right arrow

Natural Language Models – Detecting Fake News Articles!

A significant amount of content on the internet is in textual format. Almost every organization stores lots of internal data and resources as text documents. Natural language processing (NLP) is a subfield of machine learning that’s concerned with organizing, understanding, and making decisions based on textual input data. Over the past decade, NLP has become the utmost important aspect of transforming business processes and making informed decisions. For example, a sentiment analysis model can help a business understand the high-level sentiments of their customers toward their products and services. A topic modeling algorithm combined with sentiment analysis can figure out the key pain points of the customers and thus it can inform the business decisions to make customer satisfaction a priority.

In this chapter, we will develop an ML system that can recognize fake news articles. Such systems can help in keeping...

Technical requirements

The code samples used in this chapter can be found in this book’s GitHub repository: https://github.com/PacktPublishing/The-Definitive-Guide-to-Google-Vertex-AI/tree/main/Chapter17.

Detecting fake news using NLP

Nowadays, due to the increase in the use of the internet, it has become really easy to spread fake news. A large number of users are consuming and posting content on the internet via their social media accounts daily. It has become difficult to distinguish the real news from the fake news. Fake news, however, can do significant damage to a person, society, organization, or political party. Looking at the scale, it is impossible to skim through every article manually or using a human reviewer. Thus, there is a need to develop smart algorithms that can automatically detect fake news articles and stop the spread of dangerous news as soon as it is generated.

ML-based classification algorithms can be used to detect fake news. First, we need a good training dataset to train the classification model on so that it can learn the common patterns of fake news and thus automatically distinguish it from real news. In this section, we will train an ML model to classify...

Launching model training on Vertex AI

In this section, we will launch our training experiment as a Vertex AI training job. There are multiple advantages of launching training jobs on Vertex AI instead of doing it in a Juypter Notebook:

  • The flexibility to launch any number of parallel experiments
  • We can choose the best hardware for model training, which is very important when accelerators are needed to train deep learning models.
  • We don’t need active monitoring regarding training progress
  • There’s no fear of the Jupyter Notebook crashing
  • Vertex AI training jobs can be configured to log metadata and experiments in the Google Cloud Console UI
  • In this section, we will create and launch a Vertex AI training job for our experiment. There are two main things we need to do to launch a Vertex AI training job. First, we need to put the dataset in a location that will be accessible to the Vertex AI job (such as GCS or BigQuery). Second, we need to put...

BERT-based fake news classification

In our first experiment, we trained a classical random forest classifier on TF-IDF features to detect fake versus real news articles and got an accuracy score of about 93%. In this section, we will train a deep learning model for the same task and see if we get any accuracy gains over the classical tree-based approach. Deep learning has changed the way we used to solve NLP problems. Classical approaches required hand-crafted features, most of which were related to the frequency of words appearing in a document. Looking at the complexity of languages, just knowing the count of words in a paragraph is not enough. The order in which words occur also has a significant impact on the overall meaning of the paragraph or sentence. Deep learning approaches such as Long-Short-Term-Memory (LSTM) also consider the sequential dependency of words in sentences or paragraphs to get a more meaningful feature representation. LSTM has achieved great success in many...

Summary

This chapter was about a real-world NLP use case for detecting fake news. In the current era of the internet, spreading fake news has become quite easy and it can be dangerous for the reputation of a person, society, organization, or political party. As we have seen in our experiments, ML classification can be used as a powerful tool for detecting fake news articles. Deep learning-based approaches can further improve the results of text classification use cases without requiring much fine-tuning data.

After reading this chapter, you should be confident about training and applying classification models on text classification use cases, similar to fake news detection. You should also have a good understanding of the cleaning and pre-processing steps that are needed to apply classical models, such as random forest, on text data. At this point, you should be able to launch large-scale ML experiments as Vertex AI training jobs. Finally, you should have a good understanding of...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Definitive Guide to Google Vertex AI
Published in: Dec 2023Publisher: PacktISBN-13: 9781801815260
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Jasmeet Bhatia

Jasmeet is a Machine Learning Architect with over 8 years of experience in Data Science and Machine Learning Engineering at Google and Microsoft, and overall has 17 years of experience in Product Engineering and Technology consulting at Deloitte, Disney, and Motorola. He has been involved in building technology solutions that focus on solving complex business problems by utilizing information and data assets. He has built high performing engineering teams, designed and built global scale AI/Machine Learning, Data Science, and Advanced analytics solutions for image recognition, natural language processing, sentiment analysis, and personalization.
Read more about Jasmeet Bhatia

author image
Kartik Chaudhary

​Kartik is an Artificial Intelligence and Machine Learning professional with 6+ years of industry experience in developing and architecting large scale AI/ML solutions using the technological advancements in the field of Machine Learning, Deep Learning, Computer Vision and Natural Language Processing. Kartik has filed 9 patents at the intersection of Machine Learning, Healthcare, and Operations. Kartik loves sharing knowledge, blogging, travel, and photography.
Read more about Kartik Chaudhary