Reader small image

You're reading from  Expert C++ - Second Edition

Product typeBook
Published inAug 2023
PublisherPackt
ISBN-139781804617830
Edition2nd Edition
Right arrow
Authors (5):
Marcelo Guerra Hahn
Marcelo Guerra Hahn
author image
Marcelo Guerra Hahn

Marcelo Guerra Hahn, With over 18 years of experience in software development and data analysis, Marcelo Guerra Hahn is a seasoned expert in C++, C#, and Azure. As an Engineering Manager at Microsoft C++ Team and former leader of SoundCommerce's engineering team, Marcelo's passion for data and informed decision-making shines through. He shares his knowledge as a lecturer at esteemed institutions like Lake Washington Institute of Technology and University of Washington. Through this book, Marcelo aims to empower readers with advanced C++ techniques, honed by real-world experience, to become proficient programmers and skilled data analysts.
Read more about Marcelo Guerra Hahn

Araks Tigranyan
Araks Tigranyan
author image
Araks Tigranyan

Araks Tigranyan is a passionate software engineer at Critical Techworks, with an unwavering love for the world of programming, particularly in C++. Her dedication to crafting efficient and innovative solutions reflects her genuine passion for coding. Committed to excellence and driven by curiosity, Araks continuously explores new technologies, going above and beyond to deliver exceptional work. Beyond programming, Araks finds solace in sports, with football holding a special place in her heart. As an author, Araks aspires to share her profound expertise in C++ and inspire readers to embark on their programming journeys.
Read more about Araks Tigranyan

John Asatryan
John Asatryan
author image
John Asatryan

John Asatryan, the Head of Code Republic Lab at Picsart Academy, seamlessly blends his academic background in International Economic Relations from the Armenian State University of Economics with his ventures in technology and education. Driven by a genuine passion for coding, John's commitment to empowering aspiring developers is evident in his expertise in the field. His unwavering dedication to bridging the gap between education and technology inspires others to pursue their coding dreams.
Read more about John Asatryan

Vardan Grigoryan
Vardan Grigoryan
author image
Vardan Grigoryan

Vardan Grigoryan is a senior backend engineer and C++ developer with more than 9 years of experience. Vardan started his career as a C++ developer and then moved to the world of server-side backend development. While being involved in designing scalable backend architectures, he always tries to incorporate the use of C++ in critical sections that require the fastest execution time. Vardan loves tackling computer systems and program structures on a deeper level. He believes that true excellence in programming can be achieved by means of a detailed analysis of existing solutions and by designing complex systems.
Read more about Vardan Grigoryan

Shunguang Wu
Shunguang Wu
author image
Shunguang Wu

Shunguang Wu is a senior professional staff at Johns Hopkins University Applied Physics Laboratory, and received his PhDs in theoretical physics and electrical engineering from Northwestern University (China) and Wright State University (USA), respectively. He published about 50 reviewed journal papers in the area of nonlinear dynamics, statistical signal processing and computer vision in his early career. His professional C++ experience started with teaching undergraduate courses in the late 1990s. Since then he has been designing and developing lots of R&D and end-user application software using C++ in world-class academic and industrial laboratories. These projects span both the Windows and Linux platforms.
Read more about Shunguang Wu

View More author details
Right arrow

Designing and Implementing a Data Analysis Framework

Designing and implementing data analysis programs in C++ requires careful consideration of various factors. C++ is known for its efficiency and functionality, which makes it the best choice for dealing with large amounts of data. This chapter will explore the basic steps of building a complex data analysis program using C++. Defining the goals and requirements of the program is an important first step. This helps guide the design and implementation process, ensuring that the process meets specific research requirements.

Data governance is critical to the data analysis process. C++ provides robust data structures and libraries for efficient data processing. Choosing an appropriate data structure, such as arrays or vectors, is important for proper data storage and processing. Preliminary data processing and cleaning play an important role in ensuring data quality. C++ provides string manipulation capabilities to handle formatting...

Technical requirements

The g++ compiler with the -std=c++2a option is used to compile the examples in this chapter. You can find the source files used in this chapter at https://github.com/PacktPublishing/Expert-C-2nd-edition/tree/main/Chapter18.

Using and processing statistical data types

In C++, information is typically represented using standard C++ types, including integers, floating-point numbers, and strings. When running statistical data in C, it is vital to consider an appropriate information type for a selected analysis or calculation.

  • Categorical variables: They represent qualitative records falling into unique classes or classes. In C++, these variables are typically defined with the aid of strings. Developers can use standard string manipulation functions and strategies in C++ to manage this data and perform operations such as frequency counts.
  • Numerical variables: Numerical variables represent quantitative facts and can be continuous or discrete. C++ includes multiple numerical data types, including integers (int, long, and short), floating-point numbers (float and double), and other types (std::fixed and std::decimal). These types make it possible to create estimates and perform statistical and mathematical...

Working with tabular and rectangular data

Working with tabular and rectangular data in C++ is fundamental to data analysis and manipulation. Tabular data, or rectangular data, is structured in rows and columns, resembling a table or spreadsheet. C++ provides various techniques and libraries that enable efficient handling and processing of such data. This section will explore the key concepts and approaches for working with tabular and rectangular data in C++.

To represent tabular data in C++, the most common approach is to use two-dimensional arrays or vectors. Arrays provide a straightforward way to store data in a grid-like structure, where each element represents a specific cell in the table. Alternatively, vectors of vectors can be used to create a more flexible and resizable structure, allowing for dynamic manipulation of the tabular data.

When working with tabular data, it is essential to consider techniques for data input and output. C++ provides various mechanisms to...

A complete ETL pipeline design strategy

Designing a complete Extract, and Transform, Load (ETL) pipeline in C++ involves careful planning and considering various components to ensure efficient and reliable data integration and processing. An ETL pipeline encompasses extracting data from multiple sources, transforming it according to business rules or requirements, and loading it into a target system or database. This section will explore a comprehensive ETL pipeline design strategy in C++.

  1. Data Extraction: The first step in an ETL pipeline is extracting data from diverse sources. C++ offers various techniques for data extraction, including reading from files (such as CSV or JSON), connecting to databases using SQL, or integrating with APIs for real-time data retrieval—libraries such as Boost.Asio or cURL can aid in handling network-based data extraction.
  2. Data Transformation: Once the data is extracted, it often requires transformation to ensure its quality, consistency...

Summary

In C++, you can work with statistical data types, process tabular data, and design a complete ETL pipeline strategy using various libraries and techniques. Here’s a brief overview of how you can accomplish each of these tasks in C++.

Using and processing statistical data types in C++ involves utilizing the built-in data types for numerical data, such as integers (int, long, float, and double) and characters (char). These data types allow you to perform basic statistical computations and calculations. However, you can leverage libraries such as Boost, Armadillo, or Eigen for more advanced statistical analysis. These libraries provide extensive functionality for working with statistical data types, including statistical modeling, regression analysis, hypothesis testing, and data manipulation.

To work with tabular and rectangular data in C++, you can use the container classes provided by the standard library, such as std::vector or std::array. These classes allow...

Questions

  1. What are some advantages of using statistical data types in data analysis, and how can C++ facilitate their processing?
  2. How can C++ effectively handle and process tabular and rectangular data? What are some techniques and libraries available for this purpose?
  3. What are the key components and considerations in designing a complete ETL pipeline? How does C++ enable the implementation of an efficient and reliable ETL pipeline?

Further reading

C++ for Data Science by Cristiano L. Fontana (https://opensource.com/article/20/2/c-data-science)

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Expert C++ - Second Edition
Published in: Aug 2023Publisher: PacktISBN-13: 9781804617830
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (5)

author image
Marcelo Guerra Hahn

Marcelo Guerra Hahn, With over 18 years of experience in software development and data analysis, Marcelo Guerra Hahn is a seasoned expert in C++, C#, and Azure. As an Engineering Manager at Microsoft C++ Team and former leader of SoundCommerce's engineering team, Marcelo's passion for data and informed decision-making shines through. He shares his knowledge as a lecturer at esteemed institutions like Lake Washington Institute of Technology and University of Washington. Through this book, Marcelo aims to empower readers with advanced C++ techniques, honed by real-world experience, to become proficient programmers and skilled data analysts.
Read more about Marcelo Guerra Hahn

author image
Araks Tigranyan

Araks Tigranyan is a passionate software engineer at Critical Techworks, with an unwavering love for the world of programming, particularly in C++. Her dedication to crafting efficient and innovative solutions reflects her genuine passion for coding. Committed to excellence and driven by curiosity, Araks continuously explores new technologies, going above and beyond to deliver exceptional work. Beyond programming, Araks finds solace in sports, with football holding a special place in her heart. As an author, Araks aspires to share her profound expertise in C++ and inspire readers to embark on their programming journeys.
Read more about Araks Tigranyan

author image
John Asatryan

John Asatryan, the Head of Code Republic Lab at Picsart Academy, seamlessly blends his academic background in International Economic Relations from the Armenian State University of Economics with his ventures in technology and education. Driven by a genuine passion for coding, John's commitment to empowering aspiring developers is evident in his expertise in the field. His unwavering dedication to bridging the gap between education and technology inspires others to pursue their coding dreams.
Read more about John Asatryan

author image
Vardan Grigoryan

Vardan Grigoryan is a senior backend engineer and C++ developer with more than 9 years of experience. Vardan started his career as a C++ developer and then moved to the world of server-side backend development. While being involved in designing scalable backend architectures, he always tries to incorporate the use of C++ in critical sections that require the fastest execution time. Vardan loves tackling computer systems and program structures on a deeper level. He believes that true excellence in programming can be achieved by means of a detailed analysis of existing solutions and by designing complex systems.
Read more about Vardan Grigoryan

author image
Shunguang Wu

Shunguang Wu is a senior professional staff at Johns Hopkins University Applied Physics Laboratory, and received his PhDs in theoretical physics and electrical engineering from Northwestern University (China) and Wright State University (USA), respectively. He published about 50 reviewed journal papers in the area of nonlinear dynamics, statistical signal processing and computer vision in his early career. His professional C++ experience started with teaching undergraduate courses in the late 1990s. Since then he has been designing and developing lots of R&D and end-user application software using C++ in world-class academic and industrial laboratories. These projects span both the Windows and Linux platforms.
Read more about Shunguang Wu