More Information
  • Build end-to-end Natural Language Processing solutions, ranging from getting data for your model to presenting its results.
  • Core NLP concepts such as tokenization, stemming, and stop word removal.
  • Use open source libraries such as NLTK, scikit-learn, and spaCy to perform routine NLP tasks.
  • Classify emails as spam or not-spam using basic NLP techniques and simple machine learning models.
  • Put documents in their relevant topics using techniques such as TF-IDF, SVMs, and LDAs.
  • Common text data processing steps to increase the performance of your machine learning models.

There is an overflow of text data online nowadays. As a Python developer, you need to create a new solution using Natural Language Processing for your next project. Your colleagues depend on you to monetize gigabytes of unstructured text data. What do you do?

Hands-on NLP with NLTK and scikit-learn is the answer. This course puts you right on the spot, starting off with building a spam classifier in our first video. At the end of the course, you are going to walk away with three NLP applications: a spam filter, a topic classifier, and a sentiment analyzer. There is no need for fancy mathematical theory, just plain English explanations of core NLP concepts and how to apply those using Python libraries.
Taking this course will help you to precisely create new applications with Python and NLP. You will be able to build actual solutions backed by machine learning and NLP processing models with ease.

All the code and supporting files are available on GitHub at:

Style and Approach

The course is full of hands-on instructions, interesting and illustrative visualizations, and clear explanations from a data scientist. It is packed full of useful tips and relevant advice. Throughout the course, we maintain a focus on practicality and getting things done, not fancy mathematical theory.

  • Build actual solutions backed by machine learning and Natural Language Processing models, instead of meandering in theory and mathematical symbols.
  • Single-handedly build three models, one for spam filtering, 0ne for sentiment analysis, and finally one for text classification.
  • Get the right foundation from which to do applied, actual Natural Language Processing. We show you how to get open sourced data, wrangle text into Python data structures with NLTK, and predict different classes of natural language with scikit-learn.
Course Length 2 hours 46 minutes
ISBN 9781789345612
Date Of Publication 23 Jul 2018


James Cross

Colibri Digital is a technology consultancy company founded in 2015 by James Cross and Ingrid Funie. The company works to help their clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas like Big Data, Data Science, Machine Learning, and Cloud Computing.

Over the past few years they have worked with some of the world's largest and most prestigious companies, including a tier 1 investment bank, a leading management consultancy group, and one of the world's most popular soft drinks companies, helping each of them to better make sense of their data, and process it in more intelligent ways.

The company lives by their motto: Data -> Intelligence -> Action.