How to use Pandas transform() Function?

Learn via video courses
Topics Covered

Before learning about the Pandas transform() function, let us first get a brief introduction to the Pandas module.

Pandas library is an open-source (free to use) library that is built on top of another very useful Python library i.e. NumPy library. Pandas library is widely used in the field of data science, machine learning, and data analytics as it simplifies data importing and data analysis. Pandas Python package offers us a wide variety of data structures and operations that helps in easy manipulation (add, update, delete) of numerical data as well as the time series. The prime reason for the Pandas package's popularity is its easy importing feature and easy data analyzing data feature.

Let us now learn about the Pandas transform() function. The Pandas transform() function is used to perform a specific function for each value of the DataFrame. Now, a question arises here - What is a DataFrame? Well, a DataFrame is a collection of several series and forms a tabular structure having rows of Series (Series as a column).

As we know that the Pandas module deals with large sets of data. These datasets are mainly comprised of DataFrames. Now if we want to perform a specific task on all the values like incrementing all the values by a certain digit, or something like that then we can use loops to iterate over each value but this will take a lot of computation and the speed also becomes slow. If the DataFrame is more than two-dimensional then the execution time will increase way more than we think. So, in these scenarios, we use the Pandas transform() function which transforms every value of the DataFrame concerning the provided function. Please refer to the Examples section for more examples.

Syntax

The Pandas transform() function is used to perform a specific function for each value of the DataFrame. The syntax of the Pandas transform() function is quite simple. It takes one required argument and three optional parameters.

Please refer to the next section for more details on the parameters of the Pandas transform() function.

Parameter

Let us briefly discuss the parameters of the Pandas transform() function in brief.

  • fun: The fun is a required parameter which is nothing but the name of the function that we want to get executed on all the values of the DataFrame. We can even pass string function names, list-like of functions, and dict-like axis labels as the first argument in the Pandas transform() function.
  • axis: It is an optional parameter with the default value of 0. It is used to define the axis on which we want to apply the function (fun). The other values of the axis parameter are: 0, 1, index, and columns.
  • *args: It is also an optional parameter. It holds the values that we want to send into the fun function as a positional argument.
  • **kwargs: It is also an optional parameter. It holds the values that we want to send into the fun function as a keyword argument.

Returns

As we know that the Pandas transform() function is used to perform a specific function for each value of the DataFrame. This function returns us a series object or a DataFrame having the length as that of the self.

If the returned DataFrame has a varied length as compared to the self then a ValueError is raised. We must know that the Pandas transform() function does not change the original DataFrame passed as an argument but it returns a modified version of the same.

Example

So far we have discussed a lot about the Pandas transform() function. Let us take several examples for more clarity and to understand its use cases.

Transforming values

Let us take an example where we try to multiply each row value by 10 and then return it using the Pandas transform() function.

Output:

Combining groupby() results

To understand how we can use the Pandas transform() function with the groupby() function, let us briefly discuss the groupby() function first. The group by aggregate function can be used to partition and group the entire data frame by some column. We can specify the column name in the parameter of the pandas.groupby() function for grouping the specified column data.

For example, if we have the data of customers of a shopping application, the data can have an entry of the same user multiple times as a single user can buy various items. So, in such scenarios, we can use the DataFrame.groupby() aggregate function to group all the products of the same customer. For grouping the customer, we need to pass the column name (here, customer_name is the column name of the DataFrame) as the parameter to the DataFrame.groupby() function.

Let us take an example to see how the transform() function works with the groupby() function.

Output:

This solution is a one-liner solution. So we do not need to use other Pandas functions like apply(), sum(), etc.

Filtering data

The Pandas transform() function can also be used to filter data on a certain basis. Let us take the same example (as above) and then try to filter the city whose sales are greater than a certain number.

Handling Missing Values at the Group Level

We can even handle the missing values using the Pandas transform() function.

Let us take a sample DataFrame and insert some NaN value using the NumPy module's function numpy.nan. So, we can replace these NaN values with any value of our choice. Let us see how.

Output:

Conclusion

  • The Pandas transform() function is used to perform a specific function for each value of the DataFrame.
  • The first parameter is a required parameter which is nothing but the name of the function that we want to get executed on all the values of the DataFrame. We can even pass string function names, list-like of functions, and dict-like axis labels as the first argument in the Pandas transform() function.
  • The second parameter is an optional parameter with a default value of 0. It is used to define the axis on which we want to apply the function on. The other values of the axis parameter are: 0, 1, index, and columns.
  • The third parameter is also optional. It holds the values that we want to send into the fun function as a positional argument.
  • The last parameter is also optional. It holds the values that we want to send into the fun function as a keyword argument.
  • This function returns us a series object or a DataFrame having the length as that of the self. If the returned DataFrame has a varied length as compared to the self then a ValueError is raised.