Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Natural Language Processing and Computational Linguistics

You're reading from  Natural Language Processing and Computational Linguistics

Product type Book
Published in Jun 2018
Publisher Packt
ISBN-13 9781788838535
Pages 306 pages
Edition 1st Edition
Languages
Author (1):
Bhargav Srinivasa-Desikan Bhargav Srinivasa-Desikan
Profile icon Bhargav Srinivasa-Desikan

Table of Contents (22) Chapters

Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
What is Text Analysis? Python Tips for Text Analysis spaCy's Language Models Gensim – Vectorizing Text and Transformations and n-grams POS-Tagging and Its Applications NER-Tagging and Its Applications Dependency Parsing Topic Models Advanced Topic Modeling Clustering and Classifying Text Similarity Queries and Summarization Word2Vec, Doc2Vec, and Gensim Deep Learning for Text Keras and spaCy for Deep Learning Sentiment Analysis and ChatBots Other Books You May Enjoy Index

Chapter 2. Python Tips for Text Analysis

We mentioned in Chapter 1, What is Text Analysis, that we will be using Python throughout the book because it is an easy-to-use and powerful language. In this chapter, we will substantiate these claims, while also providing a revision course in basic Python for text analysis.

Why is this important? While we expect readers of the book to have a background in Python and high-school level math, it is still possible that it's been a while since you've written Python code – and even if you have, the Python code you write during text analysis and string manipulation is quite different from, say, building a website using the web framework Django. Following are the topics we will cover in this chapter:

  • Why Python?
  • Text manipulation in Python

Why Python?


In Python, we represent text in the form of string [1], which are objects of the str [2] class. They are an immutable sequence of Unicode code points or characters. It is important to make a careful distinction here, though; in Python 3, all strings are by default Unicode, but in Python 2, the str class is limited to ASCII code, and there is a Unicode class to deal with Unicodes.

Unicode is merely an encoding language or a way we handle text. For example, the Unicode value for the letter Z is U+005A. There are many encoding types, and historically in Python, developers were expected to deal with different encodings on their own, with all the low-level action happening in bytes. In fact, the shift in the way Python handles Unicode has led to a lot of discussions [3], criticism [4], and praise [5] within the community. It also remains an important point of contention when we are porting code from Python 2 and Python 3.

We said earlier on that the low-level action was going on in...

Text manipulation in Python


We mentioned earlier in the chapter that the way we represent text in Python is through strings. So how do we specify that an object is a string?

word = "Bonjour World!"

Now the word variable contains the text, Bonjour World!. Note how we used double quotes around the text that we intend to use - while single quotes also work; if we also wish to use a single quote in our string, we would need to use double quotes. Printing our word is straightforward, where all we need to do is use the print function. Remember to use parentheses if we are coding in Python 3!

print(word)
Bonjour World!

We don't have to use variables to be able to print string though - we can also just do:

print("Bonjour World!")
Bonjour World!

Be careful not to enclose your variable in quotations though! Consider this example:

print("word")
word

This will just print the word out.

We mentioned before in the chapter that a string is a sequence of characters; how do we then access the first character of a...

Summary


With the knowledge of the functions and strategies we have discussed, our text analysis can be aided; it is often when we are doing large scale text analysis that a small error can lead to completely nonsense results (remember garbage in, garbage out from Chapter 1, What is Text Analysis?).

We finish this mini-chapter with a few useful links on basic text manipulation:

  1. Printing and Manipulating Text [9]: Basic manipulation and printing of text, recommended if interested in how to display text in different ways.
  2. Manipulating Strings [10]: Basic String functions as well as exercises, useful for the further practice of string manipulation.
  3. Manipulating Strings in Python [11]: Similar to the two-preceding links includes a section on escape sequences as well.
  4. Text Processing in Python (book) [12]: Unlike the other links, this is a whole book. It covers the very fundamentals of text and string manipulation in Python and includes useful material on some uncovered topics such as regular expressions...
lock icon The rest of the chapter is locked
You have been reading a chapter from
Natural Language Processing and Computational Linguistics
Published in: Jun 2018 Publisher: Packt ISBN-13: 9781788838535
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}