String Split Function in Python
Overview
We can split a string in Python into a list of strings by using split() function. We can do this by breaking the given string by the specified separator. We can set maxsplitand separator beforehand and pass them as parameters while using the split() function.
string split() Function in Python Syntax
str: variable containing the input string. Datatype – string.
string split() Function in Python Parameter Values
- separator: This is the delimiter. The string splits at this specified delimiter. This is optional, i.e., you may or may not specify a separator. In case no separator has been specified, the default separator is space.
- maxsplit: This specifies the maximum number of times the string should be split. This is optional, i.e., you may or may not select the maxsplit count. If it has not been specified, the default value is -1, i.e., no limit on the number of splits. If any negative value is entered, it works the same as when no value is specified.
string split() Function in Python Return Value
The function split() yields a collection of strings.
Corner Cases
- The split() function can only be used on string variables. If we use it with any other data type, it shows a syntax error.
- If we specify the maxsplit count but do not specify any separator, even then, the interpreter shows a syntax error. Example – print(str.split(,5))
string split() Function in Python Examples
Example 1:
The separator has been specified in the above case, i.e., (",") comma. But since there is a maxsplit count of 4, there is a limit on the number of splits.
Output:
Example 2:
In the above case, the separator is ("is a"), which is a set of characters, so the string is split at all those points where it finds the ("is a") substring since there is no maxsplit count.
Output:
How split() works in Python?
In the picture given above, we see that the input string is split into three parts, i.e., "Apple", "Mango", and "Orange", and it is divided at those points where there is a semicolon present.
As we see, punctuation marks help us separate two sections in a sentence. Similarly, delimiters separate two different regions in a stream of data. For example, ',', ';', '@', '&', ':', '(', '>' etc all these are delimiters. The semicolon acts as a delimiter here in splitting the given string.
Python's string split() function splits a given string using a delimiter or a separator and then returns a list of strings. Like in the figure given above, after splitting the string, we get a list of words -> ['Apple', 'Mango', 'Orange']
More Examples
How does split() work when maxsplit is specified?
Case 1:
Since only one parameter has been specified in the above case, the interpreter takes this as the delimiter. Since delimiters can only be string arguments, we see a syntax error when we run the code.
Output:
We get a syntax error -> TypeError: must be str or None, not int.
Case 2:
In the above case, the separator parameter is empty, and the maxsplit count is 2. Since no delimiter has been specified in this case, the default delimiter, i.e., space, should've been taken. But instead, we get a syntax error.
Output:
We get a syntax error -> Syntax Error: Invalid Syntax.
Case 3:
In this scenario, the separator is a set of words, specifically "a and", with a maxsplit count of 5. Since the maxsplit count surpasses the number of delimiter occurrences, the string is split at every instance of the delimiter. This behaviour is akin to when the maxsplit count is -1 or unspecified.
Output:
Split the string using a comma, followed by a space as a separator
Consider a scenario where we have this string and want to extract each fruit as a separate element in a list.
The split() method is applied to the sample_string with ", " as the separator in this example.
Output:
Use a hash character as a separator
Let's consider a scenario where we have a string containing multiple hashtags, and we want to extract each hashtag as a separate element in a list.
In this example, the split() method is called on the sample_string with '#' as the separator. The result is a list containing each substring between the hash characters.
Output:
Conclusion
- We learned about the split() function and how it works.
- We also learned about maxsplit and separator parameters and their functionality.
- We saw that both the split() function parameters are optional. If they have not been specified, the interpreter takes the default values.
- We realized that the split() function works only on string variables. If we have a non-string object, we must first convert it to a string type and then use it.