Remove Whitespace from a String in C++
By default, there are six types of whitespace characters - space, line feed, horizontal tab, vertical tab, carriage return, and form feed. We can write programs to remove whitespace from a string in C++. These programs include the std::remove_if function, the std::regrex_replace function, and the boost library.
Default Categorization of Characters that are Considered Whitespace Characters
There are six types of whitespace characters by default:
- Space (' ') - Space is the most common whitespace character. It is usually used to separate two words.
- Line feed ('\n') - The line feed character moves the cursor to the next line. It signifies the end of a line of text.
- Horizontal tab ('\t') - A horizontal tab moves the cursor to the next tab stop. It is usually used for indentation purposes.
- Vertical tab ('\v') - A vertical tab moves the cursor to the next vertical tab stop. It is an outdated whitespace character and is rarely used anymore.
- Carriage return ('\r') - The carriage return character is used to reset the cursor's position to the start of a line of text.
- Form feed ('\f') - A form feed is a page break character. It is used to end the current page and start a new one.
Using std::remove_if Function
The std::remove_if algorithm moves all the non-whitespace characters to the front and returns an iterator pointing to the end of the non-whitespace characters. We can then use this iterator to remove the whitespaces using the std::erase method.
We need to specify three things to use the std::remove_if method. These are an iterator that points to the beginning of the string, an iterator that points to the end of the string, and the types of whitespace characters that should be removed from the string.
Let us look at the different methods to specify the types of whitespace characters:
1. ::isspace
To use ::isspace, we need to include the <cctype> header file in our program. The isspace() method checks for those whitespace characters that are classified by the present C locale (environment).
For example:
Output of the program:
Note that if we use std namespace, there is no need to use the scope resolution operator (::) every time.
In the above example, we used ::isspace and erase() to remove all whitespaces from the string s. In line 13, we used the erase() function. This function takes two arguments: the iterator's starting position (from where the characters will be removed) and the iterator's end position.
To get the iterator's starting position, we used the std::remove_if() function. Now, remove_if() brings all the non-whitespace characters to the beginning of the string and returns an iterator pointing at the start of the whitespace characters. This iterator is used as the starting position in erase(). To get the end position, we used s.end(), which returns an iterator pointing to the character next to the end character of a string.
With these arguments, we were able to remove the whitespaces from the string s.
2. std::isspace
If we do not want to rely on the whitespace characters classified under the C locale, we can use std::isspace. To use the std::isspace method, we need to include the <locale> header file in our program. The <locale> header file classifies whitespace characters with the ctype facet we specify. Concepts used in the following example:
We will be using these concepts in the example below.
- std::isspace - This function checks if the given character is a whitespace character classified under the present C locale. It takes two arguments: a character and a locale. It returns true if the given character is a whitespace.
- std::bind() - The std::bind() function, along with std::placeholders, allows us to modify some (or all) arguments of a function. It is like defining default arguments in a function, but unlike the default arguments, bind() allows us to change the values of the arguments we specified. This function returns a function object with one or more modified arguments.
- std::placeholders - Placeholders are namespaces that specify an argument's position in a function. Placeholders are represented by _1, _2, ....
- std::locale::classic() - The classic() function returns the classic locale. It corresponds to the locale in the C language.
For example:
Output of the program:
In the above example, in line 14, we used std::isspace<char> to classify the whitespace characters. As we know, std::isspace takes two arguments (a character and a locale), so we used the bind() function to add a placeholder (as the first argument) and the classic() locale (as the second argument). When remove_if() was called, the placeholder (std::placeholders::_1) was replaced by all the characters present in the string s, one by one. Based on the outputs returned by std::isspace (true or false), the remove_if() function arranged the characters such that all non-whitespace characters shifted to the start of the string s. Finally, the erase() function removed all the whitespace characters from the string.
3. Custom Predicate
Instead of relying on the ::isspace and std::isspace methods to remove whitespace from a string in C++, we can write custom predicates that will return true if a character is a whitespace character and false if a character is not a whitespace character. Let us take a look at the different ways in which we can write custom predicates in C++:
a. Unary Function
Output of the program:
In this example, we defined a function isWhitespace() that returned true if a character was whitespace. Otherwise, the function would return false. We used this function with std::remove_if() to push all the non-whitespaces in the front of the string and remove the whitespaces with the help of the erase() function.
b. An object of a class implementing () operator
Output of the program:
In this example, we created a class isWhitespace and an overloaded operator () that returned true if a character was whitespace. Otherwise, it returned false. We used this class with std::remove_if() in order to remove all the whitespaces from the string s.
c. Lambda
A lambda function, commonly known as lambda, defines anonymous, inline functions in C++. Lambda functions are usually used to write short and single-use code snippets.
Output of the program:
In this example, we used a lambda function as an argument of the std::remove_if function to check whether a character was whitespace. The lambda function returned true if a character was whitespace and false otherwise.
Using std::regex_replace Function
In C++ 11 (and above versions), we can use the std::regex_replace function to remove whitespace from a string in C++. This function replaces all matching characters in a string with the character specified by the user.
For example:
Output of the program:
In the above example, we used the regular expression "\\s+" to remove all the whitespaces. A regular expression represents a character sequence matching a particular string pattern. Now, in the "\\s+" regular expression, \\s matches one whitespace in a string. Writing a + after \\s matches all the whitespace characters in a string. So, we used this regular expression and the std::regex_replace function to remove all whitespaces from the string s.
Using Boost Library
The boost:algorithm::erase_all function can also be used to remove all the occurrences of the space character from a string input.
For example:
Output of the program:
In the above example, we used the Boost Library's erase_all() function to remove whitespaces from the string s. Because we specified to remove only the space (" ") characters from the string, all other whitespace characters were not erased.
Conclusion
- To remove whitespace from a string in C++, we can use the std::remove_if function, the std::regex_replace function, or the Boost Library in C++.
- The erase_all() function in the boost library can remove any specific type of whitespaces from a string.
- The std::remove_if function moves all non-whitespace characters to the front and returns an iterator pointing to the end of the non-whitespace characters.
- We can specify the types of whitespace characters in three ways: using ::isspace, using std::isspace, and using a custom predicate.
- The regular expression "\\s+" is used in the replace_regex function to remove all the whitespace characters from a string.