How to Use Pandas Dataframe (agg) Method?

Topics Covered

As we have seen in SQL, Pandas can also use the aggregate function to run over a group of data(rows or columns) to return a single value. Using this, we can define a function to be performed on an axis of a dataframe, and it will return a scalar value depending on the function that is used by the pandas agg function.

Syntax

The aggregate function can be used in series, dataframe, and groupby objects.

The syntax for using the pandas agg function is as follows.

DataFrame.agg(func=None, axis=0, *args, **kwargs)

Series.agg(func=None, axis=0, *args, **kwargs)

DataFrameGroupBy.aggregate(func=None, *args, engine=None, engine_kwargs=None, **kwargs)

Here func is the aggregation function that we are using for this operation. Axis tells whether we are operating on rows or columns. The default value is 0 which stands for columns, and similarly, axis value 1 stands for rows.

Parameters

  • func This is the function that is used for aggregating the data. It can take a function, a list of functions, or a string function name. Some of the aggregate functions are as follows.
    1. Count() - It returns the count of each group. 2.Sum() - It returns the sum of each group.
    2. Average() - It returns the average of each group.
    3. Min() - It returns the minimum value in each group.
  • axis This defines on which axis to perform the pandas agg function. Here we can use the value 0 or 1 to show columns and rows, respectively. We can also the strings 'index' and 'columns' to show the same.
  • args Positional arguments to be passed to the func.
  • kwargs Keyword arguments to be passed to the func.

Return Value

The function returns a scalar, series, or a dataframe depending on how the agg function was used.

  • It can return a scalar when Series.agg is used with a single function.
  • It can return a series when Dataframe.agg is used with a single function.
  • It can also return a dataframe when multiple functions are used in the pandas agg function.

The dataframe or series returned is a new object, and it doesn't affect the original dataframe object.

Examples

How to Use Pandas agg() Method on DataFrame

Here we will see the different methods in which we can use the agg() method in Pandas Dataframe. For example, let us take the marks of 3 different individuals in 3 different subjects as our Dataframe.

Code

Output:

Using Over the Rows

Now let us try to see the total score scored by each student in all the subjects. So we can see that we need to perform an agg function of the sum over all the rows.

Code

Output:

So we can see that we were able to sum all the groups along the provided axis.

Using Over the Columns

Now let us try to get the average marks of students across all subjects.

Code

Output:

So we're able to extract important data by using the agg function along a different axis. Here, we were using a single function to get our data, but we can use multiple agg functions over rows and columns to get a dataframe of important data.

DataFrame with the List of Functions Over the Rows and Columns.

We can run different statistical analyses in one go using a list of functions. In the previous data only, we want to find the average, min value and max value for a particular subject. Instead of running this one by one, we can pass a list of functions, and it will return a Dataframe containing all the required data.

Code

Output

Similarly, we can run this to get similar data for each student.

Code

Output:

Here, this gives us average, min, and max marks for each student as we are running it on the student axis. Previously we ran it on the subject axis.

How to Use them over the Columns and Rename the Index of the Resulting DataFrame

We can rename the indexes of the results of agg function on columns like this.

Code

Output:

Here we took the average on the column 'Maths' and assigned it to the index named as Average_Score. We can use this to structure the data properly, which makes it easier to understand.

Return the Sum of Each Row

Here in our example, the sum of each row would be the total score of the students. We can perform this by using the agg function sum over the rows.

Code

Output:

Different Aggregation Per Column

We can run different aggregations on the column based on what data we require. For example, we need the average score in 'Maths' and Minimum Score in 'English' and Maximum Score in 'Science'. We can do that using the following code.

Code

Output:

How to Use Pandas agg() Method in Series

When we are dealing with a dataframe, we get a series as an output for the agg() function if we are using a single function. We can also use agg function on a series. Here the axis is single, so we get a single value output on the operation.

Code

Output:

Here, we can see that the output of the average score of the students is not a series or a dataframe, but a scalar value of a float.

How to Use agg() Method Per Group

Here if we have similar values per column, then we can group the data by the column and then run the aggregate function on it. Let us take a dataframe where in column 'A' we have similar values. We would like to see for each value in column 'A' what is the corresponding minimum and maximum value in 'C'

Code

Output:

Here we can see first of all the data is grouped by 'A'. Then we are looking for the minimum and maximum value of the grouped data in column 'C'. We can use these operations to perform complex operations with a lot of flexibility.

Conclusion

  • Pandas agg() function helps to run a function over a pre-defined axis or a group of data.
  • It can be applied on series, dataframe, and groupby object.
  • Return value could be scalar, series, or dataframe.
  • agg() function offers a lot of flexibility for structuring and manipulating the data according to our needs, and it is a very powerful tool for data analytics.