Reader small image

You're reading from  Advanced C++

Product typeBook
Published inOct 2019
Reading LevelIntermediate
Publisher
ISBN-139781838821135
Edition1st Edition
Languages
Right arrow
Authors (5):
Gazihan Alankus
Gazihan Alankus
author image
Gazihan Alankus

Gazihan Alankus holds a PhD in computer science from Washington University in St. Louis. Currently, he is an assistant professor at the Izmir University of Economics in Turkey. He teaches and conducts research on game development, mobile application development, and human-computer interaction. He is a Google developer expert in Dart and develops Flutter applications with his students in his company Gbot, which he founded in 2019.
Read more about Gazihan Alankus

Olena Lizina
Olena Lizina
author image
Olena Lizina

Olena Lizina is a software developer with 5 years experience in C++. She has practical knowledge of developing systems for monitoring and managing remote computers with a lot of users for an international product company. For the last 4 years, she has been working for international outsourcing companies on automotive projects for well-known automotive concerns. She has been participating in the development of complex and high-performance applications on different projects, such as HMI (Human Machine Interface), navigation, and applications for work with sensors.
Read more about Olena Lizina

Rakesh Mane
Rakesh Mane
author image
Rakesh Mane

Rakesh Mane has over 18 years of experience in the software industry. He has worked with proficient programmers from a variety of regions such as India, the US, and Singapore. He has mostly worked in C++, Python, shell scripting, and database. In his spare time, he likes to listen to music and travel. Also, he likes to play with, experiment with, and break things using software tools and code.
Read more about Rakesh Mane

Vivek Nagarajan
Vivek Nagarajan
author image
Vivek Nagarajan

Vivek Nagarajan is a self-taught programmer who started out in the 1980s on 8-bit systems. He has worked on a large number of software projects and has 14 years of professional experience with C++. Aside from this, he has worked on a wide variety of languages and frameworks across the years. He is an amateur powerlifter, DIY enthusiast, and motorcycle racer. He currently works as an independent software consultant.
Read more about Vivek Nagarajan

Brian Price
Brian Price
author image
Brian Price

Brian Price has over 30 years experience working in a variety of languages, projects, and industries, including over 20 years' experience in C++. He was worked on power station simulators, SCADA systems, and medical devices. He is currently crafting software in C++, CMake, and Python for a next-generation medical device. He enjoys solving puzzles and the Euler project in a variety of languages.
Read more about Brian Price

View More author details
Right arrow

Chapter 8 - Need for Speed – Performance and Optimization

Activity 1: Optimizing a Spell Check Algorithm

In this activity, we'll be developing a simple spell check demonstration and try to make it faster incrementally. You can use the skeleton file, Speller.cpp, as a starting point. Perform the following steps to implement this activity:

  1. For the first implementation of the spell check (the full code can be found in Speller1.cpp) – create a dictionary set in the getMisspelt() function:

    set<string> setDict(vecDict.begin(), vecDict.end());

  2. Loop over the text words and check for words not in the dictionary with the set::count() method. Add the misspelled words to the result vector:

    vector<int> ret;

    for(int i = 0; i < vecText.size(); ++i)

    {

      const string &s = vecText[i];

      if(!setDict.count(s))

      {

        ret.push_back(i);

      }

    };

  3. Open the terminal. Compile the program and run it as follows:

    $ g++ -O3 Speller1.cpp Timer.cpp

    $ ./a.out

    The following output will be generated:

    Figure 8.60: Example output of the solution for Step 1
    Figure 8.60: Example output of the solution for Step 1
  4. Open the Speller2.cpp file and add the unordered_set header file to the program:

    #include <unordered_set>

  5. Next, change the set type that's used for the dictionary to unordered_set:

    unordered_set<string> setDict(vecDict.begin(), vecDict.end());

  6. Open the Terminal. Compile the program and run it as follows:

    $ g++ -O3 Speller2.cpp Timer.cpp

    $ ./a.out

    The following output will be generated:

    Figure 8.61: Example output of the solution for Step 2
    Figure 8.61: Example output of the solution for Step 2
  7. For the third and final version, that is, Speller3.cpp, we will use a bloom filter. Start by defining a hash function based on the BKDR function. Add the following code to implement this:

    const size_t SIZE = 16777215;

    template<size_t SEED> size_t hasher(const string &s)

    {

      size_t h = 0;

      size_t len = s.size();

      for(size_t i = 0; i < len; i++)

      {

        h = h * SEED + s[i];

      }

      return h & SIZE;

    }

    Here, we used an integer template parameter so that we can create any number of different hash functions with the same code. Notice the use of the 16777215 constant, which is equal to 2^24 – 1. This lets us use the fast bitwise-and operator instead of the modulus operator to keep the hashed integer less than SIZE. If you want to change the size, keep it as one less than a power of two.

  8. Next, let's declare a vector<bool> for a bloom filter in getMisspelt() and populate it with the words in the dictionary. Use three hash functions. The BKDR hash can be seeded with values such as 131, 3131, 31313, and so on. Add the following code to implement this:

    vector<bool> m_Bloom;

    m_Bloom.resize(SIZE);

    for(auto i = vecDict.begin(); i != vecDict.end(); ++i)

    {

      m_Bloom[hasher<131>(*i)] = true;

      m_Bloom[hasher<3131>(*i)] = true;

      m_Bloom[hasher<31313>(*i)] = true;

    }

  9. Write the following code to create a loop that checks the words:

    for(int i = 0; i < vecText.size(); ++i)

    {

      const string &s = vecText[i];

      bool hasNoBloom =

              !m_Bloom[hasher<131>(s)]

          &&  !m_Bloom[hasher<3131>(s)]

          &&  !m_Bloom[hasher<31313>(s)];

        

      if(hasNoBloom)

      {

        ret.push_back(i);

      }

      else if(!setDict.count(s))

      {

        ret.push_back(i);

      }

    }

    The bloom filter is checked first and if it finds the word in the dictionary, we have to verify it, like we did previously.

  10. Open the terminal. Compile the program and run it as follows:

    $ g++ -O3 Speller3.cpp Timer.cpp

    $ ./a.out

    The following output will be generated:

Figure 8.62: Example output of the solution for Step 3

In the preceding activity, we attempted to solve a real-world problem and make it more efficient. Let's consider some points for each of the implementations in the three steps, as follows:

From an implementation perspective, the following guidelines apply:

There are some questions worth probing, given the results we received:

  • Why is the improvement in performance so meager with the Bloom Filter?
  • What is the effect of using a larger or smaller capacity Bloom filter?
  • What happens when fewer or more hash functions are used?
  • Under what conditions would this version be much faster than the one in Speller2.cpp?

Here are the answers to these questions:

  • Why is the improvement in performance so meager with the Bloom Filter?

    std::unordered_set performs one hash operation and perhaps a couple of memory accesses before reaching the value that's stored. The Bloom filter we use performs three hash operations and three memory accesses. So, in essence, the work that's done by the bloom filter is more than the hash table. Since there are only 31,870 words in our dictionary, the cache benefits of the Bloom filter are lost. This is another case where the traditional analysis of data structures does not correspond to real-life results because of caching.

  • What is the effect of using a larger or smaller capacity Bloom filter?

    When a larger capacity is used, the number of hash collisions reduce, along with false positives, but the caching behavior worsens. Conversely, when a smaller capacity is used, the hash collisions and the false positives increase, but the caching behavior improves.

  • What happens when fewer or more hash functions are used?

    The more hash functions are used, the fewer the false positives, and vice versa.

  • Under what conditions would this version be much faster than the one in Speller2.cpp?

    Bloom filters work best when the cost of testing a few bits is less than the cost of accessing the value in the hash table. This only becomes true when the Bloom filter bits fit completely within the cache and the dictionary does not.

lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
Advanced C++
Published in: Oct 2019Publisher: ISBN-13: 9781838821135

Authors (5)

author image
Gazihan Alankus

Gazihan Alankus holds a PhD in computer science from Washington University in St. Louis. Currently, he is an assistant professor at the Izmir University of Economics in Turkey. He teaches and conducts research on game development, mobile application development, and human-computer interaction. He is a Google developer expert in Dart and develops Flutter applications with his students in his company Gbot, which he founded in 2019.
Read more about Gazihan Alankus

author image
Olena Lizina

Olena Lizina is a software developer with 5 years experience in C++. She has practical knowledge of developing systems for monitoring and managing remote computers with a lot of users for an international product company. For the last 4 years, she has been working for international outsourcing companies on automotive projects for well-known automotive concerns. She has been participating in the development of complex and high-performance applications on different projects, such as HMI (Human Machine Interface), navigation, and applications for work with sensors.
Read more about Olena Lizina

author image
Rakesh Mane

Rakesh Mane has over 18 years of experience in the software industry. He has worked with proficient programmers from a variety of regions such as India, the US, and Singapore. He has mostly worked in C++, Python, shell scripting, and database. In his spare time, he likes to listen to music and travel. Also, he likes to play with, experiment with, and break things using software tools and code.
Read more about Rakesh Mane

author image
Vivek Nagarajan

Vivek Nagarajan is a self-taught programmer who started out in the 1980s on 8-bit systems. He has worked on a large number of software projects and has 14 years of professional experience with C++. Aside from this, he has worked on a wide variety of languages and frameworks across the years. He is an amateur powerlifter, DIY enthusiast, and motorcycle racer. He currently works as an independent software consultant.
Read more about Vivek Nagarajan

author image
Brian Price

Brian Price has over 30 years experience working in a variety of languages, projects, and industries, including over 20 years' experience in C++. He was worked on power station simulators, SCADA systems, and medical devices. He is currently crafting software in C++, CMake, and Python for a next-generation medical device. He enjoys solving puzzles and the Euler project in a variety of languages.
Read more about Brian Price