Remove Duplicates From List Python
Duplicates present in Python lists can be removed using various methods depending upon the type of elements present in the list, the size of the list, whether the order of the elements should be preserved or not, and the efficiency of the removal approach.
These methods can be iterative, use built-in functions for implementation, or import modules for their functionality.
Methods to Remove duplicates from the list
Some ways to remove duplicates from a list in Python are:
- Naive Method
- Using List Comprehension
- Using List Comprehension + enumerate()
- Using list.count() + list.remove()
- Set Method
- The reduce() Function
- Using Numpy unique method
- Using a Pandas DataFrame
Example: Remove Duplicates from List in Python
To better understand all the different methods of removing duplicates from a list, let's consider an example having an integer list a that contains 20 numbers, as shown below:
Here, after removing all the duplicates from the list; we will be left with a list containing ten unique/distinct elements:
Now, let's explore various methods to achieve the above result.
Method 1: Naive Method (Iterative or Temporary Method)
The basic approach to removing duplicates from a list in Python is to iterate through the elements of the list and store the first occurrence of an element in a temporary list while ignoring any other occurrence of that particular element.
In the naive method, the basic approach is implemented by:
- Traversing the list using for-loop.
- Adding the elements to a temporary list if it is not already present in it.
- Assigning the temporary list to the main list.
Now, let's look at the implementation of the Naive Method:
Code:
Output:
In this method, we are creating a temporary list to store unique elements. Hence, the naive method requires extra space while removing duplicates from the list.
Highlights:
(1) The main list is traversed, and unique elements are added to a temporary list.
(2) The in keyword is used to determine the first occurrence of the elements.
(3) It requires extra space to store unique elements.
Method 2: Using List Comprehension
Instead of using the For-loop to implement the Naive method of duplicates removal from the list, we can use Python's List comprehension functionality to implement the Naive method in only one line of code.
Now, let's look at the implementation of the Naive Method using List comprehension:
Output:
Here, we initialize a temp variable to store the unique elements.
Then, we use list comprehension to extract the unique elements from the input list. Hence, similar to the naive method, we require extra space to store the unique elements in a temp variable.
Let's take a look at the list comprehension statement from the above example: [temp.append(element) for element in a if element not in temp] Here, this statement indicates that :
- A for loop will iterate over the input list a and extract elements that are not present in the temp list with the help of the if condition.
- The extracted elements will be added to the temp list with the help of the List's built-in append(element) function.
NOTE:
List comprehension is a functionality of Python that is used to create new sequences from other iterables like tuples, strings, lists, etc. It shortens the code and makes it easier to read and maintain.
Syntax: [expression for item in iterable if condition]
Example: a = [x for x in range(10) if x >5] Here, a : [6,7,8,9]
Highlights:
(1) One-liner shorthand of the Naive Method.
(2) The code is easier to read and maintain.
Method 3 : Using List Comprehension + enumerate()
While using the List comprehension method, we find the distinct elements and store them in a temporary list. Whereas, when we use the List comprehension along with enumerate() function, the program checks for already occurred elements and skips adding them to the temporary list.
Enumerate function takes an iterable as an argument and returns it as an enumerating object (index, element), i.e., it adds a counter to each element of the iterable.
Now, let's look at the implementation of the List comprehension + enumerate() method:
Code:
Output:
Here, a[:index] is used to access already occurred elements.
Here, the list comprehension statement can be re-written as:
Here, we can notice that:
- The for loop is accessing every element from the input list along with its index (as provided by enumerate function).
- We are checking whether the particular element is present in the already accessed elements list, i.e., in the list a[ : index].
- For example, for the second element, i.e., the element having an index equal to 1, the if condition will check whether that element is present in the list a[ : 1] i.e., it is the same as the first element or not. If not, then it is unique and is stored in the temp variable.
Highlights:
(1) Similar to the List comprehension method.
(2) It checks for already occurred elements and skips adding them.
Method 4 : Using list.count() + list.remove()
Duplicates in the list can also be removed with the help of Python List's in-built functions such as count() and remove():
- list.count(element) - Returns the number of occurrences of an element in the list.
- list.remove(element) - Removes the first occurrence of an element from the list.
Let's understand the count() + remove() method with the help of an example:
Code:
Output:
Here, a copy (a[:]) of the main list is traversed, and the occurrence of each element is calculated with the help of the count() function in python. If the element is repeated, i.e. its count > 1, then it is removed from the main list (Modifies the main list in-place) with the help of the remove() function.
NOTE:
- Because of the in-place element removal functionality of the remove() function, this method is better than the naive method as it requires no extra space to store the unique elements.
- Here, we are using a copy of the main loop because removing an element from the same iterator (same list) can lead to unwanted results.
Highlights:
(1) In-place removal of duplicate elements.
(2) Uses the in-built functions list.count(element) and list.remove(element).
Method 5: Set Method
All the methods that we have discussed so far are very simple to understand and implement. But, they are not very efficient when working with a list having a large number of items.
To overcome this issue, we can use the Set data structure of Python.
By definition, Sets cannot contain duplicate elements. Hence, by converting a list having duplicate elements to a set, we can easily remove duplicate items from the list and create a new list from an unordered Set.
Let's understand the Set method with the help of an example:
Code:
Output:
Here, the main drawback of this method is that the original List order is not maintained as we are creating a new list from an unordered set.
Highlights:
(1) Most Popular, Simple, and Fast method suitable for a list of any size.
(2) Uses Python's Set data structure.
(3) Drawback - Order is not preserved in this method.
Method 6 : The reduce() Function
We can efficiently Remove Duplicates from List in Python using reduce() function provided by the functools module.
The reduce(function, sequence) method is used to cumulatively apply a particular function having two arguments to the elements of the sequence by:
- Traversing the sequence from Left to Right, and
- Applying the given function to the first two elements and storing the result, then
- Applying the same function to the previously stored result along with the next element in the sequence, and
- Repeating it until there are no elements left in the sequence.
Let's look at an example to understand how the reduce() function can be used to remove duplicates from a list:
Code:
Output:
The reduce() function has an optional argument known as the initializer. If it is present in the function call, the reduce function will call the particular function with the value of the initializer and the first item of the sequence to perform the first partial computation.
Then, it will cumulatively call the function with the partial computation and the next element in the sequence.
The items are added to the list and set in the tuple if they are not already present. Therefore, each new occurrence of an element in the original list is stored in an empty list, and the set acts as a look-up table for the reduce function.
The reduce() function is widely used to process iterables without writing Python for loops as the internal functionality of the reduce() function is written in C instead of Python, i.e., its internal loop is faster than that of explicit Python for-loop.
Also, this method does not require extra space to store the unique elements. Hence, this is the most efficient way to remove duplicates from lists in Python.
Highlights:
(1) Most efficient method to Remove Duplicates from List in Python.
(2) Uses reduce() function provided by the functools module.
Method 7 : Using Numpy unique method
np.unique() from Numpy efficiently finds and sorts unique elements. This method is ideal for numerical data and large datasets, offering high performance.
Code:
Output:
Method 8 : Using a Pandas DataFrame
In this method, the original list is first converted into a Pandas DataFrame, a two-dimensional labeled data structure with columns of different types. The drop_duplicates() method of the DataFrame is then used to remove duplicate rows.
Since the DataFrame is created with just one column ('Numbers') based on the original list, dropping duplicate rows effectively removes duplicate elements from the list. Finally, the unique elements in the DataFrame column are converted back to a list. This approach is particularly convenient when working with tabular data and provides extensive data manipulation capabilities, including handling duplicates.
Code:
Output:
NOTE:
Out of all the methods we have discussed, note that the Set Methods, the Dict methods, and built-in functions require the elements in the list to be hashable, i.e., they should be immutable (non-changeable).
If the elements of the list are mutable such as lists or dictionaries, it is advisable to use the naive method for duplicates removal.
Conclusion
- Duplicates present in a list can be removed using: Sets, Built-in functions, or Iterative methods.
- If the elements present in the list are non-hashable, always use an iterative approach, i.e., traverse the list and extract unique items. Iterative approaches include the Naive Method, List comprehensions, and List.count() methods.
- For preserving the order of elements, we can use Pandas functions, OrderedDict, reduce() function, Set + sort() method, and the Iterative approaches.