Pandas Multiindex, Transpose and Stack
Overview
Pandas is a strong and effective tool for data analysis and processing. Sometimes we need to structure the data accordingly to properly represent the data and analyze it. Pandas allow us to process data in multiple dimensions by using the hierarchical index in both rows and columns. We can do this using Pandas MultiIndex.
Introduction
Pandas Multi-Index allows you to select more than one row and column as your index of the Dataframe. It helps us to see and process the data in multiple dimensions, which makes sophisticated data analysis processes easier. We can create a Pandas multi-index object, or we can stack the same dataframe to have a multi-index using the Pandas stack() method. We can also do more statistical analysis using the Pandas transpose() method.
What is Multi Indexing in Pandas?
Pandas Multiindexing method is an object that has an array of tuples that has stacked multiple rows as an index. As it is an index, all the tuples inside the Pandas multi-index object will be unique. Pandas Multi index can be created from tuples, products, and data frames.
Syntax
We can create Pandas MultiIndex from different pandas objects such as:
- From Dataframe using MultiIndex.from_frame()
- From list of arrays using MultiIndex.from_arrays()
- From a crossed set of iterables using MultiIndex.from_product() Let us create a Pandas Multi-Index object using a publicly available dataset called 'Iris' from Seaborn.
Code
Output:
Now we will create a Pandas Multi-Index object using this Dataframe Code
Now we will receive a Pandas multi-index object which can be assigned as an index to a dataframe containing an index based on the dataframe that we provided. Output:
Here, we can see that we have got a multi-index using all the columns. For each entry in the table, we have a unique index. Examples Create Multi-Index Pandas Dataframe Now if we want to create a Dataframe and assign a Multi-Index to it. We can do that by creating a pandas multi-index and then assigning it to a Pandas DataFrame. Here we are creating a Pandas multi-index with two values that tell about the student's department and then the roll number. We are creating a single-level column that will store their subject, and the values will determine their marks in a particular subject.
Code
Output:
Here we can see that after applying the Pandas multi-index method, the dataframe is easier to analyze. Pandas MultiIndex to Columns The multi-index can be used to analyze the data. But if we need the multi-index back in columns. We can do that easily using reset_index() Code
Output:
We can see that MultiIndex is now transformed into columns, and a new Index has been assigned to the dataframe. Suppose the multi-index names are the same as the name of existing columns. Then it would throw an error. So it is advised to change the name of the multi-index first before applying this method.
MultiIndex to Single Index
If we want to convert the Pandas multi-index to a single index. For example, we want to take only Roll Numbers in our index and the department corresponding to those indexes as columns. Then we can do that by the following method The level uses zero indexing to select which index from the multi-index to retain as a new index in the dataframe Code
Output:
We can also choose to drop the other indexes from multi-index i.e. don't take leftover indexes from multi-index as columns in the new data frame. We can do that by adding a parameter drop=True in the method.
Drop Multilevel Index
If we have a multi-level index, then we can drop some indexes from it. We can not drop all indexes because at least one index should remain. Axis 0 suggests that we are removing multi-index columns, and Axis 1 suggests removing them from the row. The first argument can take int, str, or list[int] as input specifying dropping a single index or multiple indexes from the multi-index. Code
Output:
What is stack() method in Pandas?
The Pandas stack() method reshapes a DataFrame into a Multi-Index dataframe with more inner levels in the Index. These new layers are added by stacking columns of the dataframe into Multi-Index. If we have a multi-level column, then if we run the Pandas stack method, that level is removed from the multi-level column, and then it is added to the multi-index.
Syntax
Pandas stack() is a method of DataFrame with syntax as shown below DataFrame.stack(self, level=-1, dropna=True)
Parameters
- Level a. int, str, list[int] b. default -> 1 c. It is used to define which level(s) should be stacked from the column axis to the index axis.
- dropna a. bool b. default -> True c. When we stack column level with rows, it can have rows that have no value. So dropna is used to tell whether to drop the rows having 'na' values or to keep them.
Return Types
- Series - If the DataFrame has a single-level column, then the output is a series.
- Dataframe - If the DataFrame has a multilevel column, then the output is a Dataframe.
Examples
Single-Level Column When we have a single-level column, then the output is in the form of a series when we apply the Pandas stack method. We can take our previous example for this. Code
Output:
Here, we can see that df had a single-level column. Therefore after running the Pandas stack() method, we will get a Pandas series. We can also see that the subjects, i.e., Maths, English, and Science, were present in the column, but now we can see that in the multi-index. Multi-Level Column If we have a multi-level column, then after running the Pandas stack() method, the output would still be of a dataframe with a multi-index. Code
Output:
Here we have created a dataframe with a multilevel column. Now if we stack it the output will be a multi-index dataframe. Code
Output:
What is the transpose() Method in Pandas on a Multi-Index Data Frame?
Pandas transpose method takes the transpose of the dataframe. Transpose means it reflects the dataframe based on its main diagonal making rows into columns and columns into rows. The Pandas transpose method works similarly on a multi-index dataframe. It takes the multi-level indexes and replaces them with columns and vice versa.
Code
Output:
Here we can see that our initial dataframe df1 was a multilevel column dataframe. After running the Pandas transpose() method, the multilevel column became a multilevel index in the resultant dataframe.
How to Unstack a Dataframe in Pandas?
We can unstack a dataframe using the unstack() method of DataFrame. We can define which level to unstack. Initially, the level is set to -1, which shows that it is unstacking the innermost level of the multi-index. We can pass in the parameter level just like the stack() method of the dataframe to define one or multiple indexes to unstack. Here df1stacked is the same dataframe that we used in the example of stack(). Code
Output:
Here we can see that we had a dataframe with multilevel column df1. Then we used the stack() method to stack the columns to a multilevel index dataframe called df1stacked then we ran the unstack() method, and we were able to convert df1stacked back to df1 by unstacking the same column that came to multi-index by running the stack() method.
Conclusion
- Pandas Multi-Index allows us to create multiple rows and multiple columns as indexes.
- It can extend our analysis of the dataframe in multiple dimensions, making it easier to analyze.
- DataFrame's stack() method can be used to pivot columns to a multi-index.
- DataFrame's transpose() method lets us take the transpose of the Dataframe.
- unstack() method can be used to remove the index from the multi-index and reinsert them into the multi-level column.