Python String Module
Overview
Breaking down the heading of this article - Python string module, let's talk about the module first. Some predefined code that performs certain tasks and can be used in other Python programs, by importing it, is called a module. The string here is, not the string class that you have read about. We're going to talk about the Python string module, i.e. some predefined code that helps us work with strings in Python.
What is String in Python?
Any sequence of characters in Python is a string. Hold on, it must also be written like
or
or even,
A string by definition is an immutable data type that can’t be changed once declared in a program. Any set of characters enclosed in single or double quotes denotes a string.
Creating a String in Python
We know what is a string by definition, but how do we create a string in Python? It's just as easy as enclosing a sequence of characters in single or double quotations, like the "word" above. Next, we must store this string in a variable to be able to utilize it later. This is how it's done:
We can later access these variables like:
Output:
Python String Module: How to Access Characters in a String?
When writing programs, you might want to access certain characters in the string, for example, you might want to know what is the letter your string starts with, or the letter your string ends with, or, you might just want to take a look at every character of your string. In python, accessing a character of a string is very simple. But to do that, you must first understand 'indexing' in strings.
We use indices to allow for easy access to any character/value in the string. An index is a location for a specific character in a string.
Here is an example of indexing a string.
To access any character from the string, we can use any of the two values mapped to it, listed in the table above. Given below, is an example of the same.
Output:
Changing or Deleting Characters From a String in Python
We read that strings in python, are immutable, i.e. they cannot be changed. Then how will we be able to change or delete characters from a string in python? Let's take a look at it.
To change, or in other words 'replace' and delete characters from a string in python, we can use the replace() method. The syntax of the replace method is:
string.replace(old_str, new_str[,optional_max])
Here, the old_str is the substring that you would like to modify and, the new_str is the substring that you would like to replace the old_str with. The third parameter - optional_max is an optional parameter that represents the maximum number of times you would like to make a change in the string. The replace function will return to you a copy of the original string, with the required modifications made. Why? Because strings are immutable.
Let's look at an example of the replace() function, and try to change or delete characters from a string.
Output:
As you can see from the output, we replaced all occurrences of the character p to *, and we were also able to replace the character s with an empty string, effectively deleting the character from the string. However, you must have noticed, that when we printed our original string, it was unchanged. This shows that strings are immutable, and the result of any operations on the strings are simply modified copies of the original string.
String Operations in Python
There are a plethora of operations that you can perform on strings. Let's look at some of them.
Concatenation of Two or More Strings
Concatenation, by definition, means the linking of objects, and here, we mean the addition of one string with another, literally. Or in other words, joining a string with another.
To concatenate two strings in python, we can make use of the + operator between two strings that we want to concatenate.
Output:
To repeat the same string multiple times, we can make use of the * operator.
Output:
Iteration through Strings
Another operation that can be performed on strings is iteration i.e. looping over since strings are iterable data types.
We can make use of a for loop to iterate over the string. Say we wanted to count the number of occurrences of the letter 's' in mississippi, we could do it the following way.
Output:
In the same way, you can iterate over a string and perform the required operations.
String Membership Test
The string membership test is nothing but the test to check if a substring is present in a certain string, like the needle in a haystack problem.
To check if a substring is present in our string, we can use the python keyword - in. The result of this statement would be a boolean value, True if the substring is present in the string, and False otherwise. Let's see how to do it:
Output:
Python String Module: Formatting
The Python string module has 2 classes - the template class and the formatter class.
Let's talk about Formatting & the Formatter class. This class of the Python string module gives you the ability to customize and create your own string formatting behaviors, i.e. essentially helps you format your string. When we say format, we mean the creation and customization of your string. The implementation of this class is the same as the string.format() method. To understand what the formatter class does better, look at the examples of the format() method.
This class has 2 main public methods -
- format(format_string)
- vformat(format_string)
The formatter class is very useful if you would like to create a subclass of it, and use it to define your own format string syntax. Let's take a look at an example after learning about the string.format() method.
The format() Method
The format() method for strings in Python is very versatile and is a powerful method that formats strings. Format strings are essentially strings that contain curly braces - {} which act as placeholders or replacement fields that get replaced by specified values or variables. To specify the order in which the strings get placed into curly braces, we can make use of positional arguments.
Let's look at an example:
Output:
Now, as we read about the Formatter class which is similar to the format() method, let's look at an example comparing both of them.
Output:
Escape Sequence
Say for example you want to print the words He said, "What's there?". How would you do it? Probably like this:
Were you able to spot the mistake? Printing these words in single or double quotes will not be possible because the string itself also contains quotation marks. This will give us a SyntaxError.
Output:
We have two ways to get rid of this problem - one is to use triple quotation marks ''' ''' and the other way is to use what we call escape sequences.
These escape sequences help in getting rid of errors caused in such cases. Escape sequences start with the backslash - \ and are followed by certain characters. They are interpreted differently by Python. Let's take an example.
Output:
Well, this time we didn't get an error! Let's also look at some of the common escape sequences in Python.
Escape Sequence | Description |
---|---|
\n | adds a newline |
\\ | for the backslash |
\' | for a single quotation |
\" | for the double quotation |
\b | backspace |
\t | horizontal tab |
\v | vertical tab |
\r | carriage return |
Raw Strings to Ignore Escape Sequence
There might be times when you would want to ignore the escape sequences in the strings, and to do that you can convert the string into a "raw" string, by simply adding the character - r or R before the string.
Take a look at the following regex code. Here we're simply using the \b special sequence to see if a word starts with the characters thr.
Output:
It shows "Not a match" when the word throw clearly starts with thr and we're using the correct special sequence for pattern detection.
In the above pattern, Python does not look at the \b special sequence the way we do. For Python, the above \b is a backspace character and not a special sequence of regex that would check for the word starting with the specified characters. So when we look for a match, we do not find one.
We fix this by simply adding an r or R prefix before the pattern. Adding this makes the pattern string a raw string. If the string isn't a raw string, writing \b would mean the backspace while making the pattern string a raw string, would treat the \b as a \ backlash followed by a b and would then mean the special sequence that checks if the word starts with the specified characters.
Correct code:
Output:
The backslash is essentially used to escape characters that include even the meta characters that we discussed above, but using the prefix r or R makes Python treat \ as a normal character.
Different Methods of Python String Module
Python has a plethora of built-in functions of the string module that can be used for string manipulations. Let's take a look at some of the commonly used functions.
Function | Description |
---|---|
capitalize() | Takes the first character of the string and converts it to upper case |
casefold() | Converts the entire string into lowercase characters |
center() | This function aligns the string in the 'center' |
count() | The count() function will return the count of the specified character in the string |
encode() | The encoded version of the string is returned |
endswith() | As the function name suggests, it tests if a string ends with a specified substring |
expandtabs() | This function sets the tab size of the specified string |
find() | Looks for a specified string inside another string and returns the index where the substring was found |
format() | The function we studied above, formats the specified values in the string |
format_map() | Performs the same function as format() |
index() | Looks for a specified character inside another string and returns the index where the substring was found |
isalnum() | Boolean function that would return true if all the characters in the string are alphanumeric i.e. contain only numbers or alphabets and no special characters |
isalpha() | Boolean function that returns True if all the characters in the string are alphabets |
isascii() | If all the characters in the string are ASCII characters, then this boolean function returns True |
isdecimal() | If all the characters in the string are decimals (numbers) this boolean function returns True |
isdigit() | Boolean function that would return true if all the characters in the string are digits |
isidentifier() | If the string is an identifier, then this boolean function will return True |
islower() | If all the characters in the string are in lowercase, then this boolean function returns True |
isnumeric() | If all the characters in the string are numeric, then this boolean function returns True |
isspace() | If the character is a space, then this function returns True |
isupper() | If all the characters in the string are in uppercase, then this boolean function returns True |
join() | This function takes an iterable and converts it into a string |
lower() | This function converts the string into lowercase |
replace() | Returns a string where a specified value is replaced with a specified value |
rfind() | rfind() function in python searches the string for a specified value and returns the last index of where it was found |
rindex() | Searches the string for a specified value and returns the last index of where it was found |
rstrip() | Returns a right trim version of the string |
split() | Splits the string at the specified separator, and returns a list |
splitlines() | splitlines() in Python splits the string at line breaks and returns a list |
startswith() | The startswith() in python returns true if the string starts with the specified value |
strip() | Returns a trimmed version of the string, i.e. without spaces at the start and end of the string |
upper() | upper() converts a string into upper case |
Examples for Better Understanding
Let's look at some examples of the above functions.
Output:
Conclusion
- Any sequence of characters in Python is a string. A string by definition is an immutable data type that can’t be changed once declared in a program. Any set of characters enclosed in single, double or triple quotes denotes a string.
- To create a string & use it we must store it in a variable like word = "hello"
- To access any character from the string we use indexing.
- To change, or in other words ‘replace’ and delete characters from a string in Python, we can use the replace() method. The syntax of the replace method is: string.replace(old_str, new_str[,optional_max])
- Some of the string operations are:
- Concatenation of two or more strings
- Iteration through strings
- String membership Test
- The format() method for strings in Python is a method that formats strings. Format strings are essentially strings that contain curly braces - {} which act as placeholders or replacement fields that get replaced by specified values or variables.
- Escape sequences start with the backslash - \ and are followed by certain characters. They are interpreted differently by Python.
- The backslash is essentially used to escape characters that include even the meta characters that we discussed above, but using the prefix r or R makes Python treat \ as a normal character.
Explore Scaler Topics Python Tutorial and enhance your Python skills with Reading Tracks and Challenges.