You're reading from Expert C++ - Second Edition

Product type Book

Published in Aug 2023

Publisher Packt

ISBN-13 9781804617830

Pages 604 pages

Edition 2nd Edition

Languages

Concepts

Application Development

Authors (5):

Marcelo Guerra Hahn

Araks Tigranyan

John Asatryan

Vardan Grigoryan

Shunguang Wu

View More author details

Table of Contents (24) Chapters

Preface

1. Part 1:Under the Hood of C++ Programming

2. Chapter 1: Building C++ Applications

3. Chapter 2: Beyond Object-Oriented Programming

4. Chapter 3: Understanding and Designing Templates

5. Chapter 4: Template Meta Programming

6. Chapter 5: Memory Management and Smart Pointers

7. Part 2: Designing Robust and Efficient Applications

8. Chapter 6: Digging into Data Structures and Algorithms in STL

9. Chapter 7: Advanced Data Structures

10. Chapter 8: Functional Programming

11. Chapter 9: Concurrency and Multithreading

12. Chapter 10: Designing Concurrent Data Structures

13. Chapter 11: Designing World-Ready Applications

14. Chapter 12: Incorporating Design Patterns in C++ Applications

15. Chapter 13: Networking and Security

16. Chapter 14: Debugging and Testing

17. Chapter 15: Large-Scale Application Design

18. Part 3:C++ in the AI World

19. Chapter 16: Understanding and Using C++ in Machine Learning Tasks

20. Chapter 17: Using C++ in Data Science

21. Chapter 18: Designing and Implementing a Data Analysis Framework

22. Index

Why subscribe?

23. Other Books You May Enjoy

Using C++ in Data Science

C++ is widely used in many fields, including data science. Data scientists typically choose Python because of its simplicity and breadth of libraries, but C++ offers some advantages that make it an effective tool for data analysis. This chapter explains why C++ can be used in the data science industry and how it makes it possible. C++ is fast and efficient. In C++, code is compiled into machine code before execution. This compilation enables C++ programs to execute significantly faster than an interpreted language such as Python. C++ can perform well when dealing with extensive data or computationally intensive tasks. C++ algorithms can use lower memory management and better code execution to process data faster.

Additionally, C++ provides extensive support for parallel computing. The language offers libraries such as OpenMP and MPI, allowing developers to standardize their code and use multicore processors and distributed systems. Parallel computing is...

Technical requirements

The g++ compiler with the -std=c++2a option is used to compile the examples in this chapter. You can find the source files used in this chapter at https://github.com/PacktPublishing/Expert-C-2nd-edition/tree/main/Chapter17.

Introduction to data science

Data science is a set of disciplines that combines statistical analysis, machine learning, and domain knowledge to extract insights and informed decisions from large complex datasets. It involves collecting, processing, and analyzing data to reveal patterns, trends, and relationships, which are predictive models that can be used to drive business decisions.

The essence of data science is the process of analyzing and pre-processing data. This involves understanding the structure and quality of the data, identifying missing values, outliers, and anomalies, and transforming the data into a format suitable for analysis to facilitate subsequent analytical procedures such as data cleaning. Feature engineering and dimensionality reduction are better and more efficient.

After pre-processing the data, data scientists use statistical and machine learning techniques to extract insights and build models. They use statistical techniques such as hypothesis testing...

Data capturing and manipulation

Data capturing and manipulation are critical areas of data science. They involve the acquisition, extraction, transformation, and processing of data so that it is helpful for analysis and decision-making. These techniques are important in gaining meaningful insights and taking advantage of large, complex datasets. In this article, we will explore the importance of data capture and manipulation and discuss the basic concepts and techniques of these techniques.

Data capture refers to collecting and retrieving data from various sources. This can include structured data from databases, spreadsheets, or APIs and unstructured data from text, images, and social media sources. The data capture phase involves identifying the right start, extracting data, and converting it into a format suitable for analysis. Techniques such as web scraping, data extraction tools, and data integration frameworks are often used to capture and aggregate data from various sources...

Data cleansing and processing

Data cleaning and processing is a key step in the data science industry, where unstructured data is processed and used to improve its quality, integrity, and usability. These processes play a key role in ensuring that the data used for assessment and decision-making is accurate, precise, and dependable. This section will explore the importance of data cleansing and processing and discuss these processes’ basic concepts and techniques.

Data cleaning, also known as data cleaning or data scrubbing, refers to the process of identifying, correcting, or removing errors, inconsistencies, and anomalies from a data structure. Raw data often contain missing values, anomalies, records duplicates, inconsistent characters, or other abnormalities that are biased if not dealt with or may produce inaccurate results. Data cleansing aims to address these issues and improve data collection.

However, using the information to make the data relevant to analysis...

Applying machine learning algorithms

Machine learning algorithms are central to data science and artificial intelligence. They use mathematical models and statistical techniques to train computers to learn from data and make predictions or perform informal actions. Machine learning algorithms enable you to extract insights and patterns from large, complex datasets and inform decisions, automatically processing and improving predictive capabilities. Let us examine and discuss some commonly used algorithms.

Machine learning algorithms can be classified into three categories: supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning algorithms learn from labeled training data, where each data point is associated with a corresponding goal or outcome. These algorithms aim to generalize from the training data and make predictions about unseen data. Commonly used supervised learning algorithms include linear regression, decision trees, support vector...

Data visualization

Data visualization is the visual representation of data and information. It requires visual representations such as charts, diagrams, and maps to communicate complex issues and examples to a large audience effectively. Data visualization is important in exploratory data analysis, insight presentation, and decision-making processes. This section will examine data visualization’s importance and discuss some key features and techniques.

One of the key benefits of data visualization is the ability to simplify complex data and make it more meaningful and accessible. Patterns, trends, and relationships can be quickly identified by visually presenting information, allowing participants to gain insight and make appropriate decisions. Visual representations recognizing them makes it easier to identify notable features, differences, and anomalies in the data.

Data visualization can take many forms, including bar charts, line graphs, scatter plots, histograms,...

Summary

Data science is an interdisciplinary field that utilizes statistical methods, machine learning algorithms, and data visualization to extract insights from large volumes of data. It involves programming skills, mathematical expertise, and domain knowledge to explore, transform, and model data for informed decision-making and predictions.

The first step in the data science pipeline is data capturing and manipulation. This process involves collecting and organizing data from various sources into a structured format. Data scientists work with large datasets, employing efficient methods to manipulate and transform the data. This includes merging datasets, filtering out irrelevant information, and handling missing or inconsistent data, ensuring a solid foundation for analysis.

Data cleansing and processing are crucial to enhancing data quality. Data scientists address anomalies and errors by identifying and handling missing values, outliers, and inconsistencies. They use imputation...

Questions

What are some features of C++ that make it suitable for data analysis and manipulation?
Is there a way to read data from an external source, such as a database or file in C++, and manipulate it?
What are the common methods and libraries in C++ for data cleaning, processing, and normalization tasks?
In C++, how can you implement popular machine learning algorithms such as linear regression?
How can you display and analyze your data effectively with interactive and attractive data visualizations in C++?