Reader small image

You're reading from  Expert C++ - Second Edition

Product typeBook
Published inAug 2023
PublisherPackt
ISBN-139781804617830
Edition2nd Edition
Right arrow
Authors (5):
Marcelo Guerra Hahn
Marcelo Guerra Hahn
author image
Marcelo Guerra Hahn

Marcelo Guerra Hahn, With over 18 years of experience in software development and data analysis, Marcelo Guerra Hahn is a seasoned expert in C++, C#, and Azure. As an Engineering Manager at Microsoft C++ Team and former leader of SoundCommerce's engineering team, Marcelo's passion for data and informed decision-making shines through. He shares his knowledge as a lecturer at esteemed institutions like Lake Washington Institute of Technology and University of Washington. Through this book, Marcelo aims to empower readers with advanced C++ techniques, honed by real-world experience, to become proficient programmers and skilled data analysts.
Read more about Marcelo Guerra Hahn

Araks Tigranyan
Araks Tigranyan
author image
Araks Tigranyan

Araks Tigranyan is a passionate software engineer at Critical Techworks, with an unwavering love for the world of programming, particularly in C++. Her dedication to crafting efficient and innovative solutions reflects her genuine passion for coding. Committed to excellence and driven by curiosity, Araks continuously explores new technologies, going above and beyond to deliver exceptional work. Beyond programming, Araks finds solace in sports, with football holding a special place in her heart. As an author, Araks aspires to share her profound expertise in C++ and inspire readers to embark on their programming journeys.
Read more about Araks Tigranyan

John Asatryan
John Asatryan
author image
John Asatryan

John Asatryan, the Head of Code Republic Lab at Picsart Academy, seamlessly blends his academic background in International Economic Relations from the Armenian State University of Economics with his ventures in technology and education. Driven by a genuine passion for coding, John's commitment to empowering aspiring developers is evident in his expertise in the field. His unwavering dedication to bridging the gap between education and technology inspires others to pursue their coding dreams.
Read more about John Asatryan

Vardan Grigoryan
Vardan Grigoryan
author image
Vardan Grigoryan

Vardan Grigoryan is a senior backend engineer and C++ developer with more than 9 years of experience. Vardan started his career as a C++ developer and then moved to the world of server-side backend development. While being involved in designing scalable backend architectures, he always tries to incorporate the use of C++ in critical sections that require the fastest execution time. Vardan loves tackling computer systems and program structures on a deeper level. He believes that true excellence in programming can be achieved by means of a detailed analysis of existing solutions and by designing complex systems.
Read more about Vardan Grigoryan

Shunguang Wu
Shunguang Wu
author image
Shunguang Wu

Shunguang Wu is a senior professional staff at Johns Hopkins University Applied Physics Laboratory, and received his PhDs in theoretical physics and electrical engineering from Northwestern University (China) and Wright State University (USA), respectively. He published about 50 reviewed journal papers in the area of nonlinear dynamics, statistical signal processing and computer vision in his early career. His professional C++ experience started with teaching undergraduate courses in the late 1990s. Since then he has been designing and developing lots of R&D and end-user application software using C++ in world-class academic and industrial laboratories. These projects span both the Windows and Linux platforms.
Read more about Shunguang Wu

View More author details
Right arrow

Using C++ in Data Science

C++ is widely used in many fields, including data science. Data scientists typically choose Python because of its simplicity and breadth of libraries, but C++ offers some advantages that make it an effective tool for data analysis. This chapter explains why C++ can be used in the data science industry and how it makes it possible. C++ is fast and efficient. In C++, code is compiled into machine code before execution. This compilation enables C++ programs to execute significantly faster than an interpreted language such as Python. C++ can perform well when dealing with extensive data or computationally intensive tasks. C++ algorithms can use lower memory management and better code execution to process data faster.

Additionally, C++ provides extensive support for parallel computing. The language offers libraries such as OpenMP and MPI, allowing developers to standardize their code and use multicore processors and distributed systems. Parallel computing is...

Technical requirements

The g++ compiler with the -std=c++2a option is used to compile the examples in this chapter. You can find the source files used in this chapter at https://github.com/PacktPublishing/Expert-C-2nd-edition/tree/main/Chapter17.

Introduction to data science

Data science is a set of disciplines that combines statistical analysis, machine learning, and domain knowledge to extract insights and informed decisions from large complex datasets. It involves collecting, processing, and analyzing data to reveal patterns, trends, and relationships, which are predictive models that can be used to drive business decisions.

The essence of data science is the process of analyzing and pre-processing data. This involves understanding the structure and quality of the data, identifying missing values, outliers, and anomalies, and transforming the data into a format suitable for analysis to facilitate subsequent analytical procedures such as data cleaning. Feature engineering and dimensionality reduction are better and more efficient.

After pre-processing the data, data scientists use statistical and machine learning techniques to extract insights and build models. They use statistical techniques such as hypothesis testing...

Data capturing and manipulation

Data capturing and manipulation are critical areas of data science. They involve the acquisition, extraction, transformation, and processing of data so that it is helpful for analysis and decision-making. These techniques are important in gaining meaningful insights and taking advantage of large, complex datasets. In this article, we will explore the importance of data capture and manipulation and discuss the basic concepts and techniques of these techniques.

Data capture refers to collecting and retrieving data from various sources. This can include structured data from databases, spreadsheets, or APIs and unstructured data from text, images, and social media sources. The data capture phase involves identifying the right start, extracting data, and converting it into a format suitable for analysis. Techniques such as web scraping, data extraction tools, and data integration frameworks are often used to capture and aggregate data from various sources...

Data cleansing and processing

Data cleaning and processing is a key step in the data science industry, where unstructured data is processed and used to improve its quality, integrity, and usability. These processes play a key role in ensuring that the data used for assessment and decision-making is accurate, precise, and dependable. This section will explore the importance of data cleansing and processing and discuss these processes’ basic concepts and techniques.

Data cleaning, also known as data cleaning or data scrubbing, refers to the process of identifying, correcting, or removing errors, inconsistencies, and anomalies from a data structure. Raw data often contain missing values, anomalies, records duplicates, inconsistent characters, or other abnormalities that are biased if not dealt with or may produce inaccurate results. Data cleansing aims to address these issues and improve data collection.

However, using the information to make the data relevant to analysis...

Applying machine learning algorithms

Machine learning algorithms are central to data science and artificial intelligence. They use mathematical models and statistical techniques to train computers to learn from data and make predictions or perform informal actions. Machine learning algorithms enable you to extract insights and patterns from large, complex datasets and inform decisions, automatically processing and improving predictive capabilities. Let us examine and discuss some commonly used algorithms.

Machine learning algorithms can be classified into three categories: supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning algorithms learn from labeled training data, where each data point is associated with a corresponding goal or outcome. These algorithms aim to generalize from the training data and make predictions about unseen data. Commonly used supervised learning algorithms include linear regression, decision trees, support vector...

Data visualization

Data visualization is the visual representation of data and information. It requires visual representations such as charts, diagrams, and maps to communicate complex issues and examples to a large audience effectively. Data visualization is important in exploratory data analysis, insight presentation, and decision-making processes. This section will examine data visualization’s importance and discuss some key features and techniques.

One of the key benefits of data visualization is the ability to simplify complex data and make it more meaningful and accessible. Patterns, trends, and relationships can be quickly identified by visually presenting information, allowing participants to gain insight and make appropriate decisions. Visual representations recognizing them makes it easier to identify notable features, differences, and anomalies in the data.

Data visualization can take many forms, including bar charts, line graphs, scatter plots, histograms,...

Summary

Data science is an interdisciplinary field that utilizes statistical methods, machine learning algorithms, and data visualization to extract insights from large volumes of data. It involves programming skills, mathematical expertise, and domain knowledge to explore, transform, and model data for informed decision-making and predictions.

The first step in the data science pipeline is data capturing and manipulation. This process involves collecting and organizing data from various sources into a structured format. Data scientists work with large datasets, employing efficient methods to manipulate and transform the data. This includes merging datasets, filtering out irrelevant information, and handling missing or inconsistent data, ensuring a solid foundation for analysis.

Data cleansing and processing are crucial to enhancing data quality. Data scientists address anomalies and errors by identifying and handling missing values, outliers, and inconsistencies. They use imputation...

Questions

  1. What are some features of C++ that make it suitable for data analysis and manipulation?
  2. Is there a way to read data from an external source, such as a database or file in C++, and manipulate it?
  3. What are the common methods and libraries in C++ for data cleaning, processing, and normalization tasks?
  4. In C++, how can you implement popular machine learning algorithms such as linear regression?
  5. How can you display and analyze your data effectively with interactive and attractive data visualizations in C++?

Further reading

For further information, refer to the following:

  • Data Science for Business by Foster Provost and Tom Fawcett
  • C++ Data Structures and Algorithm Design Principles by Pavel A. Pevzner and Michael S. Sanders
  • Mastering OpenCV 4 with C++ by Daniel Lélis Baggio, David Millán Escrivá, Khvedchenia Ievgen, and Naureen Mahmood
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Expert C++ - Second Edition
Published in: Aug 2023Publisher: PacktISBN-13: 9781804617830
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (5)

author image
Marcelo Guerra Hahn

Marcelo Guerra Hahn, With over 18 years of experience in software development and data analysis, Marcelo Guerra Hahn is a seasoned expert in C++, C#, and Azure. As an Engineering Manager at Microsoft C++ Team and former leader of SoundCommerce's engineering team, Marcelo's passion for data and informed decision-making shines through. He shares his knowledge as a lecturer at esteemed institutions like Lake Washington Institute of Technology and University of Washington. Through this book, Marcelo aims to empower readers with advanced C++ techniques, honed by real-world experience, to become proficient programmers and skilled data analysts.
Read more about Marcelo Guerra Hahn

author image
Araks Tigranyan

Araks Tigranyan is a passionate software engineer at Critical Techworks, with an unwavering love for the world of programming, particularly in C++. Her dedication to crafting efficient and innovative solutions reflects her genuine passion for coding. Committed to excellence and driven by curiosity, Araks continuously explores new technologies, going above and beyond to deliver exceptional work. Beyond programming, Araks finds solace in sports, with football holding a special place in her heart. As an author, Araks aspires to share her profound expertise in C++ and inspire readers to embark on their programming journeys.
Read more about Araks Tigranyan

author image
John Asatryan

John Asatryan, the Head of Code Republic Lab at Picsart Academy, seamlessly blends his academic background in International Economic Relations from the Armenian State University of Economics with his ventures in technology and education. Driven by a genuine passion for coding, John's commitment to empowering aspiring developers is evident in his expertise in the field. His unwavering dedication to bridging the gap between education and technology inspires others to pursue their coding dreams.
Read more about John Asatryan

author image
Vardan Grigoryan

Vardan Grigoryan is a senior backend engineer and C++ developer with more than 9 years of experience. Vardan started his career as a C++ developer and then moved to the world of server-side backend development. While being involved in designing scalable backend architectures, he always tries to incorporate the use of C++ in critical sections that require the fastest execution time. Vardan loves tackling computer systems and program structures on a deeper level. He believes that true excellence in programming can be achieved by means of a detailed analysis of existing solutions and by designing complex systems.
Read more about Vardan Grigoryan

author image
Shunguang Wu

Shunguang Wu is a senior professional staff at Johns Hopkins University Applied Physics Laboratory, and received his PhDs in theoretical physics and electrical engineering from Northwestern University (China) and Wright State University (USA), respectively. He published about 50 reviewed journal papers in the area of nonlinear dynamics, statistical signal processing and computer vision in his early career. His professional C++ experience started with teaching undergraduate courses in the late 1990s. Since then he has been designing and developing lots of R&D and end-user application software using C++ in world-class academic and industrial laboratories. These projects span both the Windows and Linux platforms.
Read more about Shunguang Wu