Chapter 6. Dealing with the String Data Type
In the previous chapters, we have discussed several simple data structures, such as lists, linked lists, strings, stacks, and queues. Now, starting from this chapter, we are going to discuss non-linear data types. This chapter will discuss the string
data type, including how to construct, use, and solve several problems in the string
data type. The following are topics that will be discussed in this chapter:
- Introducing the
string
data type in C++ - Finding out whether a string is an anagram or palindrome
- Creating a sequence of binary digits as binary string
- Generating a subsequence of a string
- Searching a pattern in a string
To follow along with this chapter, including the source code, we require the following:
String is a data type that stores a collection of characters. This collection can form a word or some information that can be understood by humans, it can also form a sentence from several words. In this section, we are going to discuss how to construct and use the string
data type in C++.
Constructing a string using character array
In C++, a string can be composed by using an array of characters. When we compose a string using an array, we have to reserve a space to store a NULL character (\0) at the last array's element to indicate the end of the string. Suppose we want to create a string
variable containing a name of a person called James; we need an array with at least six elements, since James is composed of five characters. Please take a look at the following diagram:
As we can see in the preceding diagram, we need an array with at least six elements to store a string containing five letters. There are several ways to create a string using character arrays. These are some...
A collection of characters is used to construct a word or a sentence. The position of each character in a word matters, since different character positions can cause the word to have different meanings. For example, when you rearrange the characters in God, you will get Dog.
There are two methods in strings to find another word from a word, or to ensure a word has exactly the same spelling both forward and backward. Let's play with them.
Rearranging a word to create an anagram
An anagram is a word that is produced by rearranging the letters of the word itself. Let's take a look at the word ELBOW. We can say that BELOW is anagram of ELBOW, since BELOW uses all the original letters of ELBOW exactly once. Not only from one word, an anagram can also be created from, and can create, two or more words: SCHOOL MASTER is an anagram of THE CLASSROOM, or FOURTH OF JULY is an anagram of JOYFUL FOURTH.
Checking whether two strings are an anagram or not is quite easy. We just need to sort...
Constructing a string from binary digits
A binary string is a string that represents a number in binary format and only contains 0
and 1
. Supposing we have a number 3
, the binary string will be 11
, or the binary string of 9
is 1001
.
Binary string is usually used to hold non-traditional data, such as pictures. Suppose we have the following black and white image:
If black is represented by 0
and white is represented by 1
, we can create the following binary string to represent the preceding image:
111111111111111111111111111111111111111
111111111111111111111111111111111111111
111111111111111111111111111111111111111
111111111111111000000000011111111111111
111111111111100000000000001111111111111
111111111110000000000000000011111111111
111111111100000000000000000001111111111
111111111000000000000000000000111111111
111111111000000000000000000000011111111
111111110000000000000000000000011111111
111111110000000000000000000000001111111
111111100000000000000000000000001111111
111111100000000000000000000000001111111...
Subsequence string is a string derived from another string by deleting some characters without changing the order of the remaining characters. Suppose we have a string: donut. The subsequences of this word would be—d, o, do, n, dn, on, don, u, du, ou, dou, nu, dnu, onu, donu, t, dt, ot, dot, nt, dnt, ont, dont, ut, dut, out, dout, nut, dnut, onut, and donut.
Generating subsequences from a string
To find out all subsequences of a string, we need to iterate through all characters of the string. We also create a bit counter variable to mark which element position should be considered to take as a subsequence, also known as a power set. The power set of S
is the set of all subsets of S
. Suppose we have three characters in a string, which are xyz. The power set of the string will be 2n
elements, which is as follows:
BIT -> SUBSET
===================
000 -> Empty subset
001 -> "x"
010 -> "y"
011 -> "xy"
100 -> "z"
101 -> "xz"
110 -> "yz"
111 -> "xyz"
By...
Pattern searching is an algorithm to find out the location of a string in another string. This process is usually used in a word processor (such as Notepad or Sublime Text) to find a word position in a document. Let's look at the sentence—the quick brown fox jumps over the lazy dog. If we need to find out the position of the word the, we can use this algorithm and pass the as the pattern.
Note
You might be confused with Regular Expression (regex) and pattern searching, the latter of which we are going to discuss in this section. With regex, we can check if a string satisfies a given pattern, while in this section we'll be discovering how to find a string (called a pattern) in another string. If you're interested in learning about RegEx in C++, you can go to http://www.cplusplus.com/reference/regex/.
To find the position of the pattern in a string, we have to iterate through the string's elements from the beginning until the last possible element. In our preceding example,...
In this chapter, we have discussed how to construct a string in C++ using std::string
. We then used it to solve several problems in the string
data type. We can rearrange a string to create another string, which is called an anagram. We also can detect if a string is a palindrome if it has the exact same spelling both forward and backward. We have discussed binary string, and can now convert a decimal number to binary string and vice versa.
Another string problem we have solved is checking whether a string is a subsequence of another string. Also, we have successfully generated all possible subsequences from a string. Finally, we have learned how a word processor (such as Notepad or Sublime Text) can find a word in a document using pattern searching.
In the next chapter, we are going to discuss another non-linear data structure, which is a tree structure, and the algorithm implementation of this data structure.