Python Split Function Overview

22 May 2023
48
0
Reading: 2 min

The Split function in Python is a powerful tool for dividing strings into smaller components based on a specified delimiter. It allows you to split a string into substrings (tokens) and retrieve meaningful information from text data. This article provides a comprehensive overview of the Split function, covering its syntax, practical examples, and advanced use cases.

Glossary comes first

Prior to delving deeper into the Split function of Python, we need to define it, alongside some other closely related terms and phenomena:

  • String — any text fragment in Python, enclosed within either single quotes (‘’) or double quotes (“”); unlike a code line, it is limited to text only, for example:
    • Message = String splitting with ZorbasMedia
    • Name = ZorbasMedia
    • Sentence = I’m held captive to write the articles, please help, thank you
    • Multiline = “““Triple quotes (single ones included) for printing out the line exactly the way it is written in the code”””
  • Split — the division of a string into smaller components: ‘Everybody’, ‘loves’, ‘ZorbasMedia’
  • Token — a.k.a., substring, the product of string splitting, a component, or chunk of information, e.g., ‘ZorbasMedia’; the number of tokens per line is equal to the number of splits + 1
  • Data Parsing — a meaningful operation with the data strings, like identifying and extracting some pieces of information, based on some certain criteria; basically, the next step in string tokenization
  • import re — is a statement in Python that allows you to import the re module. The re module provides functions and methods for working with regular expressions, which are powerful tools for pattern matching and manipulation of strings. By importing re, you gain access to its functionality, such as searching, matching, and splitting strings based on patterns defined using regular expressions.

Python’s split function syntax

In Python, the ‘split()’ function is a built-in string method that allows you to split a string into a list of substrings (tokens) based on a specified delimiter. The ‘split() function takes the delimiter as an argument and returns a list containing the substrings.

Here’s the syntax of the ‘split() function in Python:

string.split(separator, maxsplit)

  • separator (optional): it specifies the delimiter, or separator, at which the string should be split. If no separator is provided, the ‘split() function uses whitespace (spaces, tabs, and newlines) as the default delimiter.
  • maxsplit (optional): it specifies the maximum number of splits to be performed. If provided, the string will be split at most ‘maxsplit – 1 times. The default value is -1, which means all occurrences of the delimiter will be split, up until the end of a line.

The ‘split()’ function in Python is useful for tasks such as parsing data, tokenizing strings, and extracting individual components from a larger string. By splitting a string into substrings, you can easily process and manipulate the data based on your specific requirements.

Splitting function in Python put to practice

Splitting function is easier to understand in action. Here are some lines of code, using the split function in Python to split a given sentence:

Splitting with the delimiter only

Example Output
  • sentence = “Splitting is not limited to tests only, try splitting the strings with ZorbasMedia”
  • tokens = sentence.split()
  • print(tokens)
[‘Splitting’, ‘is’, ‘not’, ‘limited’, ‘to’, ‘tests’, ‘only,’, ‘try’, ‘splitting’, ‘the’, ‘strings’, ‘with’, ‘ZorbasMedia’]
Explanation: in the first example of code lines, the ‘split() function is called without any arguments, like delimiter, so it splits the sentence into individual words based on the whitespace between them.
  • sentence = “Splitting is not limited to tests only, try splitting the strings with ZorbasMedia”
  • tokens = sentence.split(‘,’)
  • print(tokens)
[‘Splitting is not limited to tests only’, ‘ try splitting the strings with ZorbasMedia’]
Explanation: in the second example of code lines, the split(‘,’) function is used to split the sentence based on commas (‘,’) as a delimiter. It separates the sentence into different tokens wherever it encounters a comma.
  • sentence = Splitting is not limited to tests only, try splitting the strings with ZorbasMedia”
  • tokens = sentence.split(‘ ‘)
  • print(tokens)
[‘Splitting’, ‘is’, ‘not’, ‘limited’, ‘to’, ‘tests’, ‘only,’, ‘try’, ‘splitting’, ‘the’, ‘strings’, ‘with’, ‘ZorbasMedia’]
Explanation: the third example is similar to Example 1, but here we explicitly specify the space (‘ ‘) as the delimiter. It splits the sentence into individual words.
  • sentence = “Splitting is not limited to tests only, try splitting the strings with ZorbasMedia”
  • tokens = sentence.split(‘splitting’)
  • print(tokens)
[‘Splitting is not limited to tests only, try ‘, ‘ the strings with ZorbasMedia’]
Explanation: In this example of code lines, the split(‘splitting’) function is used to split the sentence wherever it encounters the word ‘splitting’. It divides the sentence into tokens based on the occurrence of ‘splitting’ in the line.

 

These examples demonstrate different uses of the split function in Python, allowing you to split strings based on different delimiters or patterns as needed. They don’t use the ‘maxsplit’ argument, but the next section of the article will have some examples of the Python function with the delimiter and ‘maxsplit’ argument included.

Splitting with the ‘maxsplit’ argument

Example Output
  • sentence = “Splitting is not limited to tests only, try splitting the strings with ZorbasMedia
  • tokens = sentence.split(maxsplit=2)
  • print(tokens)
[‘Splitting’, ‘is’, ‘not limited to tests only, try splitting the strings with ZorbasMedia’]
Explanation: In this example of code lines, maxsplit=2 is passed as an argument to the split() function with no specified delimiter. Python splits the sentence in this case into a maximum of 3 tokens per line. The first two words are separated from the rest of the sentence as individual tokens.
  • sentence = “Splitting is not limited to tests only, try splitting the strings with ZorbasMedia”
  • tokens = sentence.split(‘,’, maxsplit=1)
  • print(tokens)
[‘Splitting is not limited to tests only’, ‘ try splitting the strings with ZorbasMedia’]
Explanation: In this example of code lines, maxsplit=1 is passed as an argument to the split(‘,’) function. Python splits the sentence based on the first occurrence of a comma (‘,’) and creates two tokens. The first token contains the words before the comma, and the second token contains the remaining sentence.
  • sentence = “Splitting is not limited to tests only, try splitting the strings with ZorbasMedia”
  • tokens = sentence.split(‘ ‘, maxsplit=4)
  • print(tokens)
[‘Splitting’, ‘is’, ‘not’, ‘limited’, ‘to tests only, try splitting the strings with ZorbasMedia’]
Explanation: In this example of code lines, maxsplit=4 is passed as an argument to the split(‘ ‘) function. It splits the sentence into a maximum of 4 tokens. The first four words are separated from the rest of the sentence as individual tokens.

 

These examples demonstrate the use of the ‘maxsplit’ argument with the split function in Python. The ‘maxsplit’ argument allows you to control the maximum number of splits performed on the string, resulting in a specific number of tokens.

Advanced splitting

Split function in Python can be used in many creative ways. Remember about the ‘import re’ command from the glossary? We’re going to need it now.

Example Output
  • import re
  • sentence = “Hello;world,test|split,example”
  • result = re.split(‘;|,|\|’, sentence)
  • print(result)
[‘Hello’, ‘world’, ‘test’, ‘split’, ‘example’]
Example 1: splitting on multiple delimiters. You can split a string in Python, based on multiple delimiters, by passing them as a single string or using regular expressions. 
  • sentence = “Hello,world,test,split,example”
  • result = re.split(‘([,])’, sentence)
  • print(result)
[‘Hello’, ‘,’, ‘world’, ‘,’, ‘test’, ‘,’, ‘split’, ‘,’, ‘example’]
Example 2: splitting and preserving delimiters. By using capturing groups in regular expressions, you can split a string while preserving the delimiters.
  • sentence = Hello,,world,,,test”
  • result = [word for word in sentence.split(‘,’) if word]
  • print(result)
[‘Hello’, ‘world’, ‘test’]
Example 3: splitting and discarding empty strings. By default, the split function includes empty strings in the result if there are consecutive delimiters. To discard empty strings, you can use list comprehension or filter the result.

 

These are just a few examples of unusual or peculiar cases when using the split function in Python. The flexibility of the split function allows for creative use cases depending on your specific needs.

Conclusion

Python’s split function is a versatile tool for dividing strings into smaller components based on a specified delimiter. It enables you to extract meaningful information from text data by splitting a string into substrings (tokens) based on various delimiters. This article provided a comprehensive overview of the split function, covering its syntax, practical examples, and advanced use cases. By mastering the split function, you can manipulate and parse strings effectively in Python.

Have a story to tell about traffic arbitrage?
Become a ZorbasMedia contributor!
Become an author