Reader small image

You're reading from  Data Science for Web3

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781837637546
Edition1st Edition
Concepts
Right arrow
Author (1)
Gabriela Castillo Areco
Gabriela Castillo Areco
author image
Gabriela Castillo Areco

Gabriela Castillo Areco holds an M.Sc. in big data science from the TECNUM School of Engineering, University of Navarra. With extensive experience in both the business and data facets of blockchain technology, Gabriela has undertaken roles as a data scientist, machine learning analyst, and blockchain consultant in both large corporations and small ventures. She served as a professor of new crypto businesses at Torcuato di Tella University and is currently a member of the BizOps data team at IOV Labs.
Read more about Gabriela Castillo Areco

Right arrow

Building our pipeline

In an NLP pipeline, preparation generally encompasses a pre-processing step where we clean and normalize the data. Following that, a feature representation step translates the language into input that can be consumed by our chosen models. Once this is completed, we are ready to build, train, and evaluate the model. This strategic plan will be implemented throughout the subsequent sections.

Preparation

Language manifests in numerous variations. There are formatting nuances, such as capitalization or punctuation; words that serve as linguistic aids without true semantic meaning, such as prepositions; and special characters, including emojis, further enrich the landscape. To work with this data, we must transform raw text into a dataset while following a similar criterion as numeric datasets. This cleaning process enables us to eliminate outliers, reduce noise, manage vocabulary size, and optimize data for ingestion by NLP models.

A basic flow diagram of...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Science for Web3
Published in: Dec 2023Publisher: PacktISBN-13: 9781837637546

Author (1)

author image
Gabriela Castillo Areco

Gabriela Castillo Areco holds an M.Sc. in big data science from the TECNUM School of Engineering, University of Navarra. With extensive experience in both the business and data facets of blockchain technology, Gabriela has undertaken roles as a data scientist, machine learning analyst, and blockchain consultant in both large corporations and small ventures. She served as a professor of new crypto businesses at Torcuato di Tella University and is currently a member of the BizOps data team at IOV Labs.
Read more about Gabriela Castillo Areco