Packt+ | Advance your knowledge in tech

You're reading from Applied Unsupervised Learning with Python Discover hidden patterns and relationships in unstructured data with Python

Product type Paperback

Published in May 2019

Publisher

ISBN-13 9781789952292

Length 482 pages

Edition 1st Edition

Languages

Python

Tools

Scikit-learn

Concepts

Machine Learning

Authors (3):

Benjamin Johnston

Aaron Jones

Christopher Kruger

View More author details

Table of Contents (12) Chapters

Applied Unsupervised Learning with Python

Preface

1. Introduction to Clustering FREE CHAPTER

2. Hierarchical Clustering

3. Neighborhood Approaches and DBSCAN

4. Dimension Reduction and PCA

5. Autoencoders

6. t-Distributed Stochastic Neighbor Embedding (t-SNE)

7. Topic Modeling

8. Market Basket Analysis

9. Hotspot Analysis

Appendix

Chapter 7: Topic Modeling

Activity 15: Loading and Cleaning Twitter Data

Solution:

Import the necessary libraries:

import langdetect
import matplotlib.pyplot
import nltk
import numpy
import pandas
import pyLDAvis
import pyLDAvis.sklearn
import regex
import sklearn

Load the LA Times health Twitter data (latimeshealth.txt) from https://github.com/TrainingByPackt/Applied-Unsupervised-Learning-with-Python/tree/master/Lesson07/Activity15-Activity17:
Note
Pay close attention to the delimiter (it is neither a comma nor a tab) and double-check the header status.
```
path = '<Path>/latimeshealth.txt'
df = pandas.read_csv(path, sep="|", header=None)
df.columns = ["id", "datetime", "tweettext"]
```

Run a quick exploratory analysis to ascertain the data size and structure:

def dataframe_quick_look(df, nrows):
print("SHAPE:\n{shape}\n".format(shape=df.shape))
print("COLUMN NAMES:\n{names}\n".format(names=df.columns))
print("HEAD:\n{head}\n".format(head=df.head(nrows)))

dataframe_quick_look(df, nrows=2)

The output...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

Benjamin Johnston

Benjamin Johnston is a senior data scientist for one of the world's leading data-driven MedTech companies and is involved in the development of innovative digital solutions throughout the entire product development pathway, from problem definition to solution research and development, through to final deployment. He is currently completing his Ph.D. in ML, specializing in image processing and deep convolutional neural networks. He has more than 10 years of experience in medical device design and development, working in a variety of technical roles, and holds a first-class honors bachelor's degree in both engineering and medical science from the University of Sydney, Australia.

See other products by Benjamin Johnston

Aaron Jones

Aaron Jones is a full-time senior data scientist and consultant. He has built models and data products while working in retail, media, and environmental science. Aaron is based in Seattle, Washington and has a particular interest in clustering algorithms, natural language processing, and Bayesian statistics.

See other products by Aaron Jones

Christopher Kruger

Christopher Kruger is a practicing data scientist and AI researcher. He has managed applied machine learning projects across multiple industries while mentoring junior team members on best practices. His primary focus is on pushing both business practicality as well as academic rigor in every project. Chris is currently developing research in the computer vision space.

See other products by Christopher Kruger