Reader small image

You're reading from  Data-Centric Machine Learning with Python

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781804618127
Edition1st Edition
Right arrow
Authors (3):
Jonas Christensen
Jonas Christensen
author image
Jonas Christensen

Jonas Christensen has spent his career leading data science functions across multiple industries. He is an international keynote speaker, postgraduate educator, and advisor in the fields of data science, analytics leadership, and machine learning and host of the Leaders of Analytics podcast.
Read more about Jonas Christensen

Nakul Bajaj
Nakul Bajaj
author image
Nakul Bajaj

Nakul Bajaj is a data scientist, MLOps engineer, educator and mentor, helping students and junior engineers navigate their data journey. He has a strong passion for MLOps, with a focus on reducing complexity and delivering value from machine learning use-cases in business and healthcare.
Read more about Nakul Bajaj

Manmohan Gosada
Manmohan Gosada
author image
Manmohan Gosada

Manmohan Gosada is a seasoned professional with a proven track record in the dynamic field of data science. With a comprehensive background spanning various data science functions and industries, Manmohan has emerged as a leader in driving innovation and delivering impactful solutions. He has successfully led large-scale data science projects, leveraging cutting-edge technologies to implement transformative products. With a postgraduate degree, he is not only well-versed in the theoretical foundations of data science but is also passionate about sharing insights and knowledge. A captivating speaker, he engages audiences with a blend of expertise and enthusiasm, demystifying complex concepts in the world of data science.
Read more about Manmohan Gosada

View More author details
Right arrow

From Model-Centric to Data-Centric – ML’s Evolution

By now, you might be thinking: if data-centricity is essential to the further evolution of AI and ML, how come model-centricity is the dominant approach?

This is a very relevant question to ask, and one we will answer in this chapter. To understand what it takes to shift to a data-centric approach, we must understand the forces that have led to model-centricity being the predominant approach, and how to overcome them.

We will start this chapter by exploring why the evolution of AI and ML has predominately followed a model-centric approach, before diving into the huge opportunity that can be unlocked through data-centricity.

Throughout this chapter, we will challenge the notion that ML requires big datasets and that more data is always better. There is a long tail of small data ML use cases that open up when we shift our mindset from bigger data to better data.

By the end of this chapter, you will have a clear...

Exploring why ML development ended up being mostly model-centric

A short history lesson is in order to truly appreciate why a data-centric approach is the key to unlocking the full potential of ML.

The fields of data science and ML have achieved significant advancements since the earliest attempts to make electronic computers act intelligently. The intelligent tasks performed by most smartphones today were nearly unimaginable at the turn of the 21st century. Moreover, we are producing more data every single day than was created from the beginning of human civilization to the 21st century – and we’re doing so at an estimated growth rate of 23% per annum1.

Despite these incredible developments in technology and data volumes, some elements of data science are very old. Statistics and data analysis have been in use for centuries and the mathematical components of today’s ML models were mostly developed long before the advent of digital computers.

For our purposes...

Unlocking the opportunity for small data ML

The group of tech companies famously labeled The Big Nine by author Amy Webb18 are examples of consumer internet companies that have leveraged big data and AI to build world dominance. Amazon, Apple, Alibaba, Baidu, Meta, Google, IBM, Microsoft, and Tencent dominate in the digital era because they utilize enormous amounts of user data to power their AI systems.

As network-based AI-first businesses, they have amassed customers on an unprecedented scale because users are happy to co-create and share their data, so long as it is a net benefit to them. For the Big Nine, getting enough modeling data is rarely a problem, and investing in the most advanced ML capabilities is a virtuous circle that enables more market dominance.

For most other organizations – and ML use cases – this sort of scale is unachievable. As we explored in Chapter 1, Exploring Data-Centric Machine Learning the long tail of ML opportunities doesn’...

Why we need data-centric AI more than ever

The leading organizations in AI, such as the Big Nine, have achieved incredible results with ML since the turn of the century, but how is AI being used in the long tail?

A 2020 survey published by MIT Sloan Management Review and Boston Consulting Group concluded that most companies struggle to turn their vision for AI into reality. In a survey of over 3,000 business leaders from 29 industries in 112 countries, 70% of respondents understood how AI can generate business value and 57% had piloted or productionized AI solutions. However, only 1 in 10 had been able to generate significant financial benefits with AI.20

The survey authors found that companies that were realizing significant financial benefits with AI had built their success on two pillars:

  • They had a solid foundation of the right data, technology, and talent.
  • They had defined several effective ways for humans and AI to work and learn together. In other words, they...

Summary

In this chapter, we reviewed the history of ML to give us a clear understanding of why model-centric ML is the dominant approach today. We also learned how a model-centric approach limits us from unlocking the potential value tied up in the long tale of ML opportunities.

By now, you should have a strong appreciation for why data-centricity is needed for the discipline of ML to achieve its full potential but also recognize that it will require substantial effort to make the shift. To become an effective data-centric ML practitioner, old habits must be broken and new ones formed.

Now, it’s time to start exploring the tools and techniques to make that shift. In the next chapter, we will discuss the principles of data-centric ML and the techniques and approaches associated with each principle.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Data-Centric Machine Learning with Python
Published in: Feb 2024Publisher: PacktISBN-13: 9781804618127
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Jonas Christensen

Jonas Christensen has spent his career leading data science functions across multiple industries. He is an international keynote speaker, postgraduate educator, and advisor in the fields of data science, analytics leadership, and machine learning and host of the Leaders of Analytics podcast.
Read more about Jonas Christensen

author image
Nakul Bajaj

Nakul Bajaj is a data scientist, MLOps engineer, educator and mentor, helping students and junior engineers navigate their data journey. He has a strong passion for MLOps, with a focus on reducing complexity and delivering value from machine learning use-cases in business and healthcare.
Read more about Nakul Bajaj

author image
Manmohan Gosada

Manmohan Gosada is a seasoned professional with a proven track record in the dynamic field of data science. With a comprehensive background spanning various data science functions and industries, Manmohan has emerged as a leader in driving innovation and delivering impactful solutions. He has successfully led large-scale data science projects, leveraging cutting-edge technologies to implement transformative products. With a postgraduate degree, he is not only well-versed in the theoretical foundations of data science but is also passionate about sharing insights and knowledge. A captivating speaker, he engages audiences with a blend of expertise and enthusiasm, demystifying complex concepts in the world of data science.
Read more about Manmohan Gosada