Packt+ | Advance your knowledge in tech

You're reading from Python Reinforcement Learning Projects

Product type Book

Published in Sep 2018

Publisher Packt

ISBN-13 9781788991612

Pages 296 pages

Edition 1st Edition

Languages

Python

Concepts

Reinforcement Learning

Authors (3):

Sean Saito

Yang Wenzhuo

Rajalingappaa Shanmugamani

View More author details

Table of Contents (17) Chapters

Title Page

Packt Upsell

Contributors

Preface

Up and Running with Reinforcement Learning

Balancing CartPole

Playing Atari Games

Simulating Control Tasks

Building Virtual Worlds in Minecraft

Learning to Play Go

Creating a Chatbot

Generating a Deep Learning Image Classifier

Predicting Future Stock Prices

Looking Ahead

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Chapter 7. Creating a Chatbot

Dialogue agents and chatbots have been on the rise in recent years. Many businesses have resorted to chatbots to answer customer inquiries, and this has been largely successful. Chatbots have been growing quickly, at 5.6x in the last year (https://chatbotsmagazine.com/chatbot-report-2018-global-trends-and-analysis-4d8bbe4d924b). Chatbots can help organizations to communicate and interact with customers without any human intervention, at a very minimal cost. Over 51% of customers have stated that they want businesses to be available 24/7, and they expect replies in less than one hour. For businesses to achieve this kind of success in an affordable manner, especially with a large customer base, they must resort to chatbots.

The background problem

Many chatbots are created with regular machine learning natural language processing algorithms, and these focus on immediate responses. A new concept is to create chatbots with the use of deep reinforcement learning. This would mean that the future implications of our immediate responses would be considered to maintain coherence.

In this chapter, you will learn how to apply deep reinforcement learning to natural language processing. Our reward function will be a future-looking function, and you will learn how to think probabilistically through the creation of this function.

Dataset

The dataset that we will use mainly consists of conversations from selected movies. This dataset will help to stimulate and understand conversational methods in the chatbot. Also, there are movie lines, which are essentially the same as the movie conversations, albeit shorter exchanges between people. Other data sets that will be used include some containing movie titles, movie characters,...

Step-by-step guide

Our solution will use modeling and will focus on the future direction of a dialogue agent, so as to generate coherent and interesting dialogue. The model will simulate the dialogue between two virtual agents, with the use of policy gradient methods. These methods are designed to reward the sequences of interaction that display three important properties of conversation: informativeness (non-repeating turns), high coherence, and simplicity in answering (this is related to the forward-looking function). In our solution, an action will be defined as the dialogue or communication utterance that the chatbot generates. Also, a state will be defined as the two previous interaction turns. In order to achieve all of this, we will use the scripts in the following sections.

Data parser

The data parser script is designed to help with the cleaning and preprocessing of our datasets. There are a number of dependencies in this script, such as pickle, codecs, re, OS, time, and numpy. This...

Summary

Chatbots are taking the world by storm, and are predicted to become more prevalent in the coming years. The coherence of the results obtained from dialogues with these chatbots has to constantly improve if they are to gain widespread acceptance. One way to achieve this would be via the use of reinforcement learning.

In this chapter, we implemented reinforcement learning in the creation of a chatbot. The learning was based on a policy gradient method that focused on the future direction of a dialogue agent, in order to generate coherent and interesting interactions. The datasets that we used were from movie conversations. We proceeded to clean and preprocess the datasets, obtaining the vocabulary from them. We then formulated our policy gradient method. Our reward functions were represented by a sequence to sequence model. We then trained and tested our data and obtained very reasonable results, proving the viability of using reinforcement learning for dialogue agents.