How to Use the Pandas rank() Function?
The Dataframe.rank() function of Pandas is used to rank the data in different ways. After sorting (by default in ascending order), the position is used to determine the rank that is returned. If data contains equal values, then they are assigned with the average of the ranks of each value by default.
Syntax
Following is the syntax for the Dataframe.rank() function:
Parameters
Required Parameter:
-
axis:
- It is 0 or index for rows and 1 or column for columns. It is zero by default.
- It is an index to direct the ranking (if the axis is 0 it will direct toward rows and if the axis is 1, ranking will be done along columns).
- This parameter is unused for Pandas Series, so defaults to 0.
-
method:
- It is the way to rank the group of records that have the same value.
- average: It is the average rank of the group
- min: min is for the lowest rank in the group
- max: highest rank in the group
- first: rankings are given according to the array's order of appearance.
- dense: similar to "min," but rank is always increased by 1 across groups.
-
na_option:
- It is the way to rank NaN (Null) values.
- It takes 3 string inputs as keep, top, and bottom, and can be the default keep.
- Keep: give NaN ranks to NaN values
- top: give NaN values the lowest rank possible
- bottom: give NaN values the highest rank possible
-
ascending:
- It is of bool type and by default, True.
- The elements should be sorted in ascending order or not.
-
pct:
- It is of bool type and by default, False.
- To display the returned rankings in percentile form or not.
Optional Parameter:
- numeric_only:
- It is of the bool type.
- Ranking only numeric columns for DataFrame objects when the value is True.
Return Type
It returns a Series or DataFrame with values according to the data ranks. Series that includes the caller series' rank for each index. A DataFrame will be returned if a DataFrame is passed into the Pandas.rank() method, and a series (or a column) will be returned if a series is passed in. This is because the method is designed to return the same type as the object that is called it.
Examples
- Basic Ranking of Your Pandas Dataframe The pandas DataFrame.rank() function can be applied to the entire DataFrame with all the default arguments.
Let’s see how to do this with our DataFrame, df in the below examples:
Code:
Output:
Explanation: In the above code example, pandas is imported as pd, and a dictionary of data is created and stored in the data variable. Pandas DataFrame is created using a dictionary. We applied the pandas rank() function i.e. .rank() to the entire DataFrame. Equal values are ranked b using the average method i.e. ranks of the same values are averaged and then the average rank is assigned. The string values are ranked alphabetically in ascending order if there is any missing value, they are ignored in the ranking and ranked as NaN.
Now, let’s see how we can rank only a single column in the example given below.
Code:
Output:
Explanation: In the above example, a new column named Ranked_Author is created in which ranks of the Author column are assigned using the pandas rank function.
- Pandas Rank Dataframe with Reverse Sort Order
The Dataframe.rank() function of pandas will rank the values of the data in ascending order by default, i.e. high values will be ranked high and lower will be ranked lower (starting from 1). But, if you want to change the order of the values in descending order, then you have to set the ascending parameter False as ascending=False.
Let's see how to rank the same columns of the data in a different order by using the pandas rank function with the help of the following examples:
Code:
Output:
Explanation: In the above code example, we set the ascending parameter as False for reverse order.
Code:
Output:
Explanation: Here, we created a new column Ranked_Author and stored the ranks of the Author column in reverse order by setting the ascending parameter as False.
- Pandas Rank DataFrame with Different Methods Having data with identical values is not uncommon. Normally, this doesn't cause any problems, but when you want to use Pandas to rank your data, you must specify how to sort equivalent values. The method= argument is used in this situation. The method= argument of pandas rank takes various options. Let's see the following examples for a better understanding of pandas method= argument.
Code:
Output:
Explanation: In the above example, we created a ranking of each method by assigning different arguments to the method parameter.
- Pandas Rank Dataframe with a Groupby (Grouped Rankings) You can apply the .rank() function of the pandas to a group. For example, you can select the same values or the highest and lowest value on some particular day by using the.groupby() function. The pandas .groupby() function is used to split the data into groups based on different criteria. With the help of this function, you can group your data and can perform different operations on grouped data.
Let's see the how to use groupby() function with pandas .rank()
Code:
Output:
Explanation: To rank our Stocks according to date, we created a new column Stocks Ranked by Date. Our data is initially grouped by Date, and then the Stocks column was selected. After that, we order the newly grouped column in descending order by setting the ascending parameter as False.
Code:
Output:
Explanation: In the above example, the Stocks that have a ranking equal to 1 are stored in the Stocks Ranked by Date column.
- Pandas Rank Dataframe with Percentages
We can normalize our rankings so they have a value between 0 and 1 using the Pandas.rank() method, which is another great feature. Even while it may appear insignificant, this enables us to compare the minimum and maximum ranks across several columns, regardless of the number of unique values in each column.
With the use of the pct parameter, we may use this normalized form of ranking. Let's look at how we can use this in Python and Pandas:
Code:
Output:
Explanation: Here, by setting the pct parameter True we can rank the pandas DataFrame with percentage.
-
How the method behaves with the parameters
-
Default_rank
Code:
Output:
Explanation: In the above code example, pandas is imported as pd, and DataFrame df is created from the dictionary. Data of the stocks column is ranked using the.rank() function and ranked stocks are then stored in the new column named default_rank.
-
Max_rank
Code:
Output:
Explanation: In the above example, Max ranked Stocks are printed by setting the method parameter as max.
-
NA_bottom
Code:
Output:
Explanation: In the above na_option=bottom is used to rank the null values as the highest rank from the bottom.
-
Pct_rank
Code:
Output:
Explanation: In the above example, Normalization between 0 and 1 of each element of data is calculated using the pct parameter.
-
-
Ranking Column with Unique values We can rank the columns with unique values without using a method argument. Let's see with the help of the following examples:
Code:
Output:
Explanation: In the above code example, pandas is imported as pd and DataFrame df is created from a dictionary. We first rank the Stocks and store them in the new column Rank. then printing it to check how it looks before sorting, then sort the ranked stocks using .sort_values and setting the inplace parameter to true,
Note: If inplace is set to True, Pandas will overwrite your data instead of creating a new DataFrame as an output.
- Sorting Column with some similar values
Code:
Output:
Explanation: Here, we perform the same as discussed in the the explanation of above code example, but rank the stocks by setting the method parameter as average.
Code:
Output:
Explanation: In the above code example, Stocks are sorted and ranked Stocks are stocks in the new column Rank.
Conclusion
- The pandas rank function i.e .rank() is used to calculate the numerical ranks of the data and returns the individual index of the series.
- If data contains equal values, then the average rank of the ranks of the same values is assigned to both equal values.
- We can apply the pandas rank function on the entire DataFrame or individual columns and can also sort the data accordingly.
- There are different methods like min, max, first etc. in the .rank() function, which are used to rank the DataFrame.
- The .groupby() function is used to split the data into groups based on specific conditions and can use with the .rank() function.
- We can normalize our data using the pct parameter of the .rank() function.
Want to Explore Further? Our Data Science Certification Course Delivers In-Depth Insights to Become a Skilled Data Scientist. Enroll Now!