Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Getting Started with Talend Open Studio for Data Integration

You're reading from  Getting Started with Talend Open Studio for Data Integration

Product type Book
Published in Nov 2012
Publisher Packt
ISBN-13 9781849514729
Pages 320 pages
Edition 1st Edition
Languages
Author (1):
Jonathan Bowen Jonathan Bowen
Profile icon Jonathan Bowen

Table of Contents (22) Chapters

Getting Started with Talend Open Studio for Data Integration
Credits
Foreword
Foreword
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
1. Knowing Talend Open Studio 2. Working with Talend Open Studio 3. Transforming Files 4. Working with Databases 5. Filtering, Sorting, and Other Processing Techniques 6. Managing Files 7. Job Orchestration 8. Managing Jobs 9. Global Variables and Contexts 10. Worked Examples Installing Sample Jobs and Data Resources Index

Normalizing and denormalizing data


Database normalization is the process whereby a database schema is designed to reduce data duplication and redundancy. If a database is not designed with normalization principles in mind, it can:

  • Get overly large, due to duplicated data

  • Make data maintenance difficult or give rise to data integrity issues if the same data values reside in multiple tables

While we are not directly concerned with database schema design in this chapter, our next two examples look at processing operations borne from the same principles as database normalization, so readers who aren't familiar with the concepts may wish to read some introductory material first. For a good primer on database normalization, go to http://en.wikipedia.org/wiki/Database_normalization .

Data normalization

Our first example shows how we can normalize data. Suppose we have a data file that has two fields: product_id and categories. A product can belong to more than one category and the category values are...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}