Pandas Index and Pandas Reindex
Overview
Pandas Dataframe assigns an address for each row and column. For rows, we call them Index and for columns, we generally use Column Names.
Indexing is referred to as collecting some amount of data from a dataframe. We can also say that subsetting some amount of data from a Dataframe is called Indexing.
If we want to work on a particular part of the Dataframe, rather than running the entire operation on the whole Dataframe, we can subset the Dataframe that has the concerned data to increase computational efficiency. Pandas Index and Reindex method helps us in achieving that.
Introduction
Pandas Index is a unique address assigned to a particular row and column to access it in less time.
Let us see what indexing a dataframe looks like. We will be using a Publicly Available dataset called [Iris](https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv ''{rel=noopener nofollow}'') from Seaborn.
Code :
Output :
If we run the Pandas Index property, we will get a RangeIndex object which tells about the Index in the Dataframe in this form.(Starting Index, Ending Index(Last Index is not Included), Step on each Index)
Using Index Property of Dataframe
Code :
Output :
So, we can see that we have indexes starting from 0 to 149 with a step of 1. Here step of 1 means that the index is going from start to stop with a difference of 1.
If the step is 2 from the start of 0, then the data would be just even numbers. Now, to index the data, we select a single row from the dataset by doing the following.
Indexing the Dataframe
Code :
Here we are trying to select the row at index 1, 2, and 3.
Output :
We can be very flexible with selecting different data points according to our needs using Pandas Index, as the need for flexibility increases with an increase in data.
How to Use Pandas Dataframe Index?
Pandas Index property gives information about the Index Labels of the Dataframe.
If the Dataframe does not have Labelled Indexes, then it gives RangeIndex object.
Code :
Output :
Examples of Pandas DataFrame Index
Selecting Some Rows and Some Columns
If we need to select some rows and some columns, we can either use the .loc[] function or the .iloc[] function based on whether we want to use labels or the integral position of the rows or columns.
Let us see an example of the .loc[] function where we will be using the labels of rows and columns to index the data that we require.
For this, we will set the ""species"" column of the dataset as the row label.
Code :
Now if we want to select ""sepal_length"" and ""sepal_width"" of species which are ""setosa"" and ""virginica"", we can do it by running the following code.
Code :
Output :
Selecting Some Columns and All Rows
If we want only some columns for all the data, we will select all rows and required columns for the data.
Let us see how we can do it by the .iloc[] function.
We will select all rows and columns which are at the position of 1 and 3 as in .iloc[], we provide the position of the rows and columns instead of row, and column labels that we do in the .loc[] function. For selecting all rows we can just give a : in place of a row array to select all rows. We can add this for columns as well.
Code :
Output :
Pandas Indexing Using [ ], .loc[], .iloc[ ], .ix[ ]
-
Using []
-
We can select a single column or multiple columns by providing the name of a column, or we can pass an array of columns to select multiple columns.
-
df[['column1','column2']]
-
Example :
To select only columns with the name 'sepal_width' and 'sepal_length'iris['sepal_width','sepal_length']
-
-
Using .loc[]
-
We can select rows or columns using their respective labels to select the data.
-
df.loc[['row1','row2'],['column1','column2']]
-
Example :
To select certain rows and columns using their labels.iris.loc[['setosa','virginica'],['sepal_length','sepal_width']]
-
-
Using .iloc[]
-
The iloc function is very similar to loc. The only difference is that it uses the integer location of the rows and columns, whereas the loc function uses the labels of the rows and columns for indexing.
-
Example :
To select data based on row and column positions.iris.iloc[[1,2,3],[1,2]]
-
-
Using .ix[]
-
The ix function is a combination of loc and iloc functions where you can use integer-based slicing as well as select data using labels.
-
This function is deprecated in the latest version of Pandas.
-
Example : To slice the data till index 4 for columns ""sepal_length"" and ""sepal_width""
iris.ix[:4,['sepal_width','sepal_length']
-
Methods for Indexing in Dataframe
We can select rows and columns from Dataframe using Pandas Index. This includes defining which rows to select, which columns to select, or both. This kind of indexing is done either by using the position of the rows or columns in integer format, or we can do the same thing using row and column labels. This kind of functionality is provided by Pandas Dataframe for flexibility of usage. The user can use the indexing according to their requirement.
How to Use dataframe.reindex()?
Syntax
Parameters
- method :
{None, ''backfill''/' ' bfill'', ''pad''/' ' ffill'', ''nearest''} Method to use for filling holes in reindexed DataFrame. If we have a Null value, we can ask the Dataframe to propagate backward or forwards to fill the data based on the previous or next value. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index. - copybool :
default True Return a new object, even if the passed indexes are the same. - level :
int or name Broadcast across a level, matching Index values on the passed MultiIndex level. - fill_value :
scalar, default np.NaN Value to use for missing values. Defaults to NaN, but can be any ""compatible"" value. - limit :
int, default None The maximum number of consecutive elements to forward or backward fill.
Return Value
Series/DataFrame with changed index.
Example
Let us try to reindex a dataframe using the Pandas Reindex method. First of all, let us create a dataframe with labeled indexes.
Code :
Output :
It shows a marks report that we created.
Now, we will reindex this using Pandas reindex method. We can reindex if we have made some mistakes, e.g. Hardik'sHardik's and Ganesh'sGanesh's marks have been replaced. At the same time, we have to add another student and initialize the value to 0.
Code :
Output :
Here, we can see that by reindexing the dataframe we were able to fix the replaced marks as well as add a new student, and we can initialize the value for the student using the fill_value parameter. If we don't provide the value then it is automatically assigned to np.nan. Pandas'Pandas' reindex method has added these things to ease the functionality of Indexing the data.
Conclusion
- Dataframe Indexing is subsetting the data according to our needs.
- We can subset the data based on rows and columns.
- We can use different methods for subsetting the data in Pandas like .loc[] , .iloc[], and .ix[].
- The df.reindex[] method can be used to reindex the DataFrame.