Packt+ | Advance your knowledge in tech

You're reading from Python Artificial Intelligence Projects for Beginners

Product typeBook

Published inJul 2018

Reading LevelIntermediate

PublisherPackt

ISBN-139781789539462

Edition1st Edition

Languages

Python

Tools

TensorFlow Scikit-learn

Concepts

Artificial Intelligence

Author (1)

Dr. Joshua Eckroth

Chapter 3. Applications for Comment Classification

In this chapter, we'll overview the bag-of-words model for text classification. We will look at predicting YouTube comment spam with the bag-of-words and the random forest techniques. Then we'll look at the Word2Vec models and prediction of positive and negative reviews with the Word2Vec approach and the k-nearest neighbor classifier.

In this chapter, we will particularly focus on text and words and classify internet comments as spam or not spam or to identify internet reviews as positive or negative. We will also have an overview for bag of words for text classification and prediction model to predict YouTube comments are spam or not using bag of words and random forest techniques. We will also look at Word2Vec models an k-nearest neighbor classifier.

But, before we start, we'll answer the following question: what makes text classification an interesting problem?

Text classification

To find the answer to our question, we will consider the famous iris flower dataset as an example dataset. The following image is of iris versicolor species. To identify the species, we need some more information other than just an image of the species, such as the flower's Petal length, Petal width, Sepal length, and Sepal width would help us identify the image better:

The dataset not only contains examples of versicolor but also contains examples of setosa and virginica as well. Every example in the dataset contains these four measurements. The dataset contains around 150 examples, with 50 examples of each species. We can use a decision tree or any other model to predict the species of a new flower, if provided with the same four measurements. As we know same species will have almost similar measurements. Since similarity has different definition all together but here we consider similarity as the closeness on a graph, if we consider each point is a flower. The following...

Detecting YouTube comment spam

In this section, we're going to look at a technique for detecting YouTube comment spam using bags of words and random forests. The dataset is pretty straightforward. We'll use a dataset that has about 2,000 comments from popular YouTube videos (https://archive.ics.uci.edu/ml/datasets/YouTube+Spam+Collection). The dataset is formatted in a way where each row has a comment followed by a value marked as 1 or 0 for spam or not spam.

First, we will import a single dataset. This dataset is actually split into four different files. Our set of comments comes from the PSY-Gangnam Style video:

Then we will print a few comments as follows:

Here we are able to see that there are more than two columns, but we will only require the content and the class columns. The content column contains the comments and the class column contains the values 1 or 0 for spam or not spam. For example, notice that the first two comments are marked as not spam, but then the comment subscribe to...

Word2Vec models

In this section, we'll learn about Word2Vec, a modern and popular technique for working with text. Usually, Word2Vec performs better than simple bag of words models. A bag of words model only counts how many times each word appears in each document. Given two such bag of words vectors, we can compare documents to see how similar they are. This is the same as comparing the words used in the documents. In other words, if the two documents have many similar words that appear a similar number of times, they will be considered similar.

But bag of words models have no information about how similar the words are. So, if two documents do not use exactly the same words but do use synonyms, such as please and plz, they're not regarded as similar for the bag of words model. Word2Vec can figure out that some words are similar to each other and we can exploit that fact to get better performance when doing machine learning with text.

In Word2Vec, each word itself is a vector, with perhaps...

Detecting positive or negative sentiments in user reviews

In this section, we're going to look at detecting positive and negative sentiments in user reviews. In other words, we are going to detect whether the user is typing a positive comment or a negative comment about the product or service. We're going to use Word2Vec and Doc2Vec specifically and the gensim Python library for those services. There are two categories, which are positive and negative, and we have over 3,000 different reviews to look at. These come from Yelp, IMDb, and Amazon. Let's begin the code by importing the gensim library, which provides Word2Vec and Doc2Vec for logging to note status of the messages:

First, we will see how to load a pre-built Word2Vec model, provided by Google, that has been trained on billions of pages of text and has ultimately produced 300-dimensional vectors for all the different words. Once the model is loaded, we will look at the vector for cat. This shows that the model is a 300-dimensional...

Summary

In this chapter, we introduced text processing and the bag of words technique. We then used this technique to build a spam detector for YouTube comments. Next, we learned about the sophisticated Word2Vec model and put it to task with a coding project that detects positive and negative product, restaurant, and movie reviews. That's the end of this chapter about text.

In the next chapter, we're going to look at deep learning, which is a popular technique that's used in neural networks.

The rest of the chapter is locked

You have been reading a chapter from

Python Artificial Intelligence Projects for Beginners

Published in: Jul 2018Publisher: PacktISBN-13: 9781789539462

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dr. Joshua Eckroth

Joshua Eckroth is an Assistant Professor of Computer Science at Stetson University, where he teaches AI, big data mining and analytics, and software engineering. He earned his PhD from The Ohio State University in AI and Cognitive Science. Dr. Eckroth also serves as Chief Architect at i2k Connect, which focuses on transforming documents into structured data using AI and enriched with subject matter expertise. Dr. Eckroth has previously published two video series with Packt, Python Artificial Intelligence Projects for Beginners and Advanced Artificial Intelligence Projects with Python. His academic publications can be found on Google Scholar.
Read more about Dr. Joshua Eckroth

Other recommended products

Related to this chapter

Deep Learning with R Cookbook

This book will help you get through the problems that you face during the execution of different tasks and understand hacks in deep learning. With unique recipes, you will implement various deep learning architectures using R 3.5.x. You will cover complex algorithms to perform tasks such as reinforcement learning, GANs, advanced neural networks and more.

BookFeb 2020328 pages

Mobile Artificial Intelligence Projects

Artificial intelligence (AI) is rapidly becoming the most popular topic in business and science. This book introduces AI concepts and their use cases with a hands-on and application-focused approach. We will cover a range of projects covering tasks such as automated reasoning, facial recognition, digital assistants, auto text generation, and more.

BookMar 2019312 pages

Hands-On Python Natural Language Processing

This book provides a blend of both the theoretical and practical aspects of Natural Language Processing (NLP). It covers the concepts essential to develop a thorough understanding of NLP and also delves into a detailed discussion on NLP based use-cases such as language translation, sentiment analysis, etc. Every module covers real-world examples

BookJun 2020316 pages4

Deep Learning Quick Reference

This book is a practical guide to applying deep neural networks including MLPs, CNNs, LSTMs, and more in Keras and TensorFlow. Packed with useful hacks to solve real-world challenges along with the supported math and theory around each topic, this book will be a quick reference for training and optimize your deep neural networks.

BookMar 2018272 pages

AI Blueprints

This book shows how to build intelligent applications to solve business needs. Several paradigms of AI are covered, including deep learning, natural language processing, planning, and logic programming. Each project is developed with a business goal in mind and care is taken to address deployment and evaluation issues. Dr. Joshua Eckroth focuses on realistic, useful, and state-of-the-art projects and techniques. He brings considerable industry and academic experience together in a book that is both educational and practical.

BookDec 2018250 pages

Hands-On Gradient Boosting with XGBoost and scikit-learn

This practical XGBoost guide will put your Python and scikit-learn knowledge to work by showing you how to build powerful, fine-tuned XGBoost models with impressive speed and accuracy. This book will help you to apply XGBoost’s alternative base learners, use unique transformers for model deployment, discover tips from Kaggle masters, and much more!

BookOct 2020310 pages

Natural Language Processing and Computational Linguistics

Discover how you can perform your own modern text analysis, to make predictions, create inferences, and gain insights about the data around you today. Learn how to harness the powerful Python ecosystem and tools such as spaCy and Gensim to perform natural language processing, and computational linguistics algorithms.

BookJun 2018306 pages

Hands-On Vision and Behavior for Self-Driving Cars

This book will give you insights into the technologies that drive the autonomous car revolution. To get started, all you need is basic knowledge of computer vision and Python.

BookOct 2020374 pages

The Deep Learning Workshop

With The Deep Learning Workshop, you’ll learn about essential deep learning concepts, such as image recognition, text embedding, and neural networks, all so that you can build your own smart machine learning models. You'll be able to learn at your own pace with the help of interesting activities and hands-on exercises that will keep you hooked throughout the book.

BookJul 2020474 pages

Hands-on Machine Learning with JavaScript

This book demonstrates various machine learning techniques and their implementation in JavaScript. Build models to power your applications with smart, predictive features. From predicting future prices, analyzing sentiments to medical diagnosis, this book shows you how to use the power of JavaScript to build efficient machine learning systems.

BookMay 2018356 pages

Hands-On Natural Language Processing with Python

This book teaches you to leverage deep learning models in performing various NLP tasks along with showcasing the best practices in dealing with the NLP challenges. The book equips you with practical knowledge to implement deep learning in your linguistic applications using NLTk and Python's popular deep learning library, TensorFlow.

BookJul 2018312 pages

Hands-On Deep Learning for Finance

There is a growing interest in applying deep learning to finance but most of the available literature is technical and generally not related to the field. This book is for practitioners who wish to use deep learning in trading and asset management. It compares the merits of different frameworks and their application to quantitative fund management.

BookFeb 2020442 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages