Reader small image

You're reading from  Mastering Azure Machine Learning

Product typeBook
Published inApr 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781789807554
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Christoph Körner
Christoph Körner
author image
Christoph Körner

Christoph Körner previously worked as a cloud solution architect for Microsoft, specializing in Azure-based big data and machine learning solutions, where he was responsible for designing end-to-end machine learning and data science platforms. He currently works for a large cloud provider on highly scalable distributed in-memory database services. Christoph has authored four books: Deep Learning in the Browser for Bleeding Edge Press, as well as Mastering Azure Machine Learning (first edition), Learning Responsive Data Visualization, and Data Visualization with D3 and AngularJS for Packt Publishing.
Read more about Christoph Körner

Kaijisse Waaijer
Kaijisse Waaijer
author image
Kaijisse Waaijer

Kaijisse Waaijer is an experienced technologist specializing in data platforms, machine learning, and the Internet of Things. Kaijisse currently works for Microsoft EMEA as a data platform consultant specializing in data science, machine learning, and big data. She works constantly with customers across multiple industries as their trusted tech advisor, helping them optimize their organizational data to create better outcomes and business insights that drive value using Microsoft technologies. Her true passion lies within the trading systems automation and applying deep learning and neural networks to achieve advanced levels of prediction and automation.
Read more about Kaijisse Waaijer

View More author details
Right arrow

Building a simple bag-of-words model

In this section, we will look at a surprisingly simple concept to tackle the shortcomings of label encoding for textual data with the bag-of-words concept, which will build a foundation for a simple NLP pipeline. Don't worry if these techniques look too simple when you read through it; we will gradually build on top of them with tweaks, optimizations, and improvements to build a modern NLP pipeline.

A naive bag-of-words model using counting

The main concept that we will build in this section is the bag-of-words model. It is a very simple concept; that is, modeling any document as a collection of words that appear in a given document with the frequency of each word. Hence, we throw away sentence structure, word order, punctuation, and so on and reduce the documents to a raw count of words. We can then vectorize this word count into a numeric vector representation, which can then be used for ML, analysis, document comparisons, and much...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Mastering Azure Machine Learning
Published in: Apr 2020Publisher: PacktISBN-13: 9781789807554

Authors (2)

author image
Christoph Körner

Christoph Körner previously worked as a cloud solution architect for Microsoft, specializing in Azure-based big data and machine learning solutions, where he was responsible for designing end-to-end machine learning and data science platforms. He currently works for a large cloud provider on highly scalable distributed in-memory database services. Christoph has authored four books: Deep Learning in the Browser for Bleeding Edge Press, as well as Mastering Azure Machine Learning (first edition), Learning Responsive Data Visualization, and Data Visualization with D3 and AngularJS for Packt Publishing.
Read more about Christoph Körner

author image
Kaijisse Waaijer

Kaijisse Waaijer is an experienced technologist specializing in data platforms, machine learning, and the Internet of Things. Kaijisse currently works for Microsoft EMEA as a data platform consultant specializing in data science, machine learning, and big data. She works constantly with customers across multiple industries as their trusted tech advisor, helping them optimize their organizational data to create better outcomes and business insights that drive value using Microsoft technologies. Her true passion lies within the trading systems automation and applying deep learning and neural networks to achieve advanced levels of prediction and automation.
Read more about Kaijisse Waaijer