How to Use the Pandas explode() Function?

Learn via video courses
Topics Covered

The pandas explode() method is used to transform each element of a list-like structure (such as Python lists, tuples, sets, or NumPy arrays) to a separate row. In practice, this method can be applied to a pandas Series or one or more columns of a dataframe in the cases when that Series or those dataframe columns contain list-likes as their values.

The explode() method always returns a new object rather than modifies the original one (unless reassigned to the same variable). By default, the index values of the resulting dataframe or Series are replicated for each new row from the original index. It's usually a good practice to reset the index after applying this method.

Two nuances to be noted here:

  • If a Series object or dataframe column(-s) on which the method is applied contains empty lists among the values, such lists will be replaced with np.nan. Instead, scalar values will be returned unchanged.
  • The data type of the resulting Series or dataframe column is always an object.

Syntax

The syntax of the explode() method applied to a pandas dataframe is very simple and has two parameters:

If we want to use this method on a pandas Series, the syntax implies only one parameter, as follows:

Parameters

  • column – a column name provided as a string (or column names provided as a list of strings) to explode. This is a required parameter of the pandas explode() method applied to a dataframe (the same method for a Series object doesn't have this parameter). The option of exploding on multiple colignore_index – an optional parameter that can be either True or False:
    • True – the initial index (or indices) is ignored and replaced by the default Python 0-based index.
    • False (the default value) – the initial index (or indices) is replicated for each new row.

Return Value

A new dataframe with the exploded lists transformed into rows is returned (the original dataframe remains untouched). If we want to use this dataframe afterward, we have to reassign it to a variable.

Keep in mind that it's almost always necessary to reset the index of the dataframe by passing in ignore_index=True to the pandas explode() method or chaining this method with reset_index(drop=True). In this way, we'll avoid duplicated indices in the dataframe.

Errors or Exceptions Raised with the explode() Method in Pandas

Apart from the common SyntaxError (e.g., if we forgot a parenthesis) or NameError (e.g., if we made a typo in the dataframe name), there are some specific errors related to the pandas explode() method. This method raises an error in the following cases:

  1. When applying the explode() method to a dataframe:
  • If we don't pass in any argument to it (a column name or a list of column names).
  1. When exploding on multiple columns of a dataframe:
  • If we provide an empty list instead of a list of column names.
  • If we provide a list of column names where some of the column names are duplicated.
  • If the row-wise list-likes in the specified columns have non-matching lengths.

Examples

Now, we'll take a look at some examples of using the pandas explode() method. First, let's import the pandas and NumPy libraries and create a dataframe to experiment with.

Code:

Output:

We see that the columns animals and diet contain lists. Let's try different ways of exploding those lists and observe the results.

Use the explode() Function with a Pandas Dataframe

Let's explode the dataframe on the animals column.

Code:

Output:

We can make the following observations:

  • The initial lists of animals (one for each row) are now transformed into separate rows.
  • The animals column now contains strings instead of lists.
  • The index for each new row was inherited from the original index and hence contained duplicated values.
  • The values from the remaining columns were respectively duplicated for each new row. Indeed, now the information from the diet column doesn't have much sense (and we're going to fix it later in this article).

Meanwhile, the original dataframe hasn't been modified:

Code:

Output:

We can reassign the result of applying the pandas explode() method to a new variable to be able to use it afterward.

Code:

Output:

To fix the issue with the dataframe index, we can pass in the optional parameter ignore_index set to True.

Code:

Output:

Alternatively, we can chain the pandas explode() method with reset_index(drop=True).

Code:

Output:

Let's confirm that these two approaches are equivalent using the pandas equals() method.

Code:

Output:

Now, let's compare how the pandas explode() method works on pandas Series object by applying it to a single column of the dataframe.

Code:

Output:

Note that this time, we didn't have to provide any arguments since this method doesn't have any required parameters when used on a Series.

Finally, let's check the two peculiarities mentioned at the beginning of this article:

  • What happens if a Series or dataframe column(-s) of interest contains empty lists among the values.
  • The data type of the resulting Series or dataframe columns.

Code:

Output:

We can observe the following:

  • The empty list was replaced with np.nan.
  • The data type of the resulting dataframe column is an object instead of the expected float (since the data type of np.nan is float).

Explode Two Columns Simultaneously

Next, we're going to use the pandas explode() method on two columns of our initial dataframe df simultaneously.

Code:

Output:

We see that now both the animals and diet columns that originally contained lists as their values are exploded simultaneously and provide more meaningful information than in the case of exploding only one of them.

Naturally, we can explode more than two columns of a dataframe if necessary.

Resolve the Error

We mentioned earlier that there are certain cases when the pandas explode() method throws an error. We're going to consider them one by one, but first, let's refresh what our main dataframe df looks like.

Code:

Output:

Case 1: if we don't pass in any argument to the method when applying it to a dataframe.

Code:

Output:

Solution of Case 1: Provide the column argument (a column name as a string or column name as a list).

Case 2: if we provide an empty list as the column argument when exploding on multiple columns of a dataframe.

Code:

Output:

Solution of Case 2: Set a list of column names of interest to the column argument.

Case 3: If we provide a list of column names containing duplicates when exploding on multiple columns of a dataframe.

Code:

Output:

Solution of Case 3: Set a list of unique column names to the column argument.

Case 4: if the row-wise list-likes in the specified columns have non-matching lengths when exploding on multiple columns of a dataframe. To see this case in work, let's create a new dataframe df_new identical to df, only that this time, we'll "forget" to specify the diet for the echidna.

Code:

Output:

Now, we'll try to explode this dataframe on the animals and diet columns simultaneously.

Code:

Output:

Solution of Case 4: since here we have a very small dataframe, we can just take a look at it and spot the row where the lists in the specified columns have non-matching lengths. However, for large real-world dataframes, we need a more universal approach to identifying the rows to be fixed.

Code:

Output:

Hence, in our case, the row with the index 0 has lists of non-matching lengths in the specified columns, and this has to be fixed.

Want to Explore Further? Our Data Science Certification Course Delivers In-Depth Insights to Become a Skilled Data Scientist. Enroll Now!

Conclusion

  • The pandas explode() method is used to transform each element of a list-like to a separate row. It can be applied to a Series or one or more columns of a dataframe if those objects contain list-likes as their values.
  • The method always returns a new object while the original one remains unchanged. We can assign this new object to a variable for further usage.
  • When applying this method to a dataframe, it's necessary to provide a column name or column names to explode.
  • By default, the index values of the resulting object are replicated for each new row from the original index. It's recommended to reset the index after applying the explode() method.
  • We can explode a dataframe on one or more columns.
  • To avoid ValueError or TypeError, we need to always provide the required parameter. When exploding on multiple columns of a dataframe, it should be a non-empty list of unique column names, and the row-wise list-likes in those columns have to have identical lengths.