Drop One or Multiple Columns in Pandas Dataframe

Topics Covered

Welcome to the world of data manipulation using Pandas! In this article, we will look at how to remove one or more columns from a Pandas DataFrame. Whether you are a data enthusiast or an experienced analyst, learning this ability is critical for optimising your data sets.

Create a DataFrame in Pandas

Pandas is an open-source data manipulation and analysis package based on the Python computer language.

Pandas has two basic data structures: Series and DataFrame. A Series is a one-dimensional array that may carry any data type, but a DataFrame is a two-dimensional table.

Let us now see how we can create DataFrame in Pandas.

Before learning more about DataFrame creation, ensure that you have Pandas installed. If not, install it using:

Creating a DataFrame from Lists

The easiest way to generate a DataFrame is to use lists. Each list corresponds to a column and should be the same length.

Output:

This will give an output in the form of a neat table with columns Name, Age, and City populated with the provided data.

Methods to Drop One or Multiple Columns in Pandas DataFrame

Let us look at the various ways to drop column(s) from a DataFrame in Pandas.

Using df.drop() Method

The df.drop() method in Pandas is a flexible utility that lets you effortlessly delete columns from your DataFrame.

Example:

Output:

In this example, the df.drop() method is used to drop columns specified in the columns_to_drop list. The axis=1 parameter indicates that columns are being dropped. The original DataFrame remains unchanged, and the new DataFrame (df_dropped) is created with the specified columns removed.

Using .dropna() Method

While the .dropna() function in Pandas is the most commonly used method to remove the missing values, it may also be used to easily drop the columns. This approach is very useful when you are required to remove the columns that include any NaN (Not a Number) values.

Example:

Output:

In this example, the df.dropna() method is used to remove rows containing NaN values. The resulting DataFrame (df_dropped) will have rows with NaN values dropped.

If you want to drop columns with NaN values instead, you can use the axis parameter:

Output

Using iloc[] Method

The iloc[] method allows us to select the specific rows and columns by their integer positions. We can also use this method to drop off columns.

Example:

Output:

In this example:

  • iloc[1:3, 0:2] selects rows with integer positions 1 to 2 (excluding 3) and columns with integer positions 0 to 1 (excluding 2).
  • The first argument inside iloc[] specifies the rows to select, and the second argument specifies the columns to select.
  • This will create a new DataFrame (selected_rows_and_columns) based on the specified integer positions.

Using df.ix() Method

The df.ix() method has been deprecated in recent Pandas versions. Instead, the recommended approach is to use more explicit methods like iloc[] or loc[] for better code clarity.

Using df.loc[] Method

The loc[] method in Pandas is used for label-based indexing, allowing you to select specific rows and columns by their labels (row and column names). Here's an example of how you can use loc[] to select specific rows and columns from a DataFrame:

Output:

This method works similarly to iloc[] discussed above.

Using Iterative

We can also go through the DataFrame and remove each column one at a time using a loop.

This code will iteratively go through the list of columns specified in columns_to_drop and drop each column from the DataFrame using the drop method with axis=1. The inplace=True parameter ensures that the changes are applied directly to the original DataFrame.

Using Dataframe.pop() Method

The DataFrame.pop() method in Pandas is used to remove a column from the DataFrame and returns the popped column.

Output:

Keep in mind that using pop() modifies the original DataFrame and returns the popped column. If you don't need the popped column, you can directly use the del statement to remove a column without returning it:

Both pop() and del modify the DataFrame in place, so use them carefully based on your specific requirements.

Drop One Column in Pandas DataFrame

To drop (remove) one column from a pandas DataFrame, you can use the DataFrame.drop() method. Here's an example:

Output:

In this example:

  • df.drop(column_to_drop, axis=1) is used to drop the 'Column2' from the DataFrame.
  • The axis=1 parameter indicates that we are dropping a column. The modified DataFrame is assigned back to the variable df.

If you want to modify the DataFrame in place without reassigning it, you can use the inplace=True parameter:

Both versions achieve the same result, but the first one creates a new DataFrame with the specified column dropped, while the second one modifies the original DataFrame in place. Choose the method that best fits your requirements.

Drop Multiple Columns in Pandas DataFrame

To drop multiple columns from a pandas DataFrame, you can use the DataFrame.drop() method and specify a list of column names to be dropped. Here's an example:

Output:

In this example:

  • df.drop(columns=columns_to_drop, axis=1) is used to drop the 'Column2' and 'Column4' from the DataFrame.
  • The axis=1 parameter indicates that we are dropping columns. The modified DataFrame is assigned back to the variable df.

The field of data analysis is evolving, and effective dataset management is essential. By making it simple to remove multiple columns from a DataFrame, Pandas improves flexibility and usability. Gaining an understanding of this ability can help you on your data science journey, whether you're organising messy datasets or preparing data for machine learning models. Keep in mind that simplicity typically results in easier-to-read and maintain code; leverage Pandas' strength for efficient data handling.

Conclusion

  • The ability to drop certain columns enables data professionals to curate datasets with precise accuracy. This guarantees that just the most important information is maintained, which simplifies analysis and improves overall data quality.
  • To drop (remove) one column from a pandas DataFrame, you can use the DataFrame.drop() method.
  • To drop multiple columns from a pandas DataFrame, you can use the DataFrame.drop() method and specify a list of column names to be dropped.
  • The DataFrame.pop() method in Pandas is used to remove a column from the DataFrame and returns the popped column.
  • The iloc[] method allows us to select the specific rows and columns by their integer positions. We can also use this method to drop off columns. The loc[] method in Pandas is used for label-based indexing, allowing you to select specific rows and columns by their labels (row and column names).
  • The df.ix() method has been deprecated in recent Pandas versions. Instead, the recommended approach is to use more explicit methods like iloc[] or loc[] for better code clarity.
  • While the .dropna() function in Pandas is the most commonly used method to remove the missing values, it may also be used to easily drop the columns. This approach is very useful when you are required to remove the columns that include any NaN (Not a Number) values.
  • Pandas uses a non-destructive technique to ensure that the original DataFrame stays intact. The discarded columns are removed while keeping the source intact for future studies or reference, respecting the original dataset's integrity.