SQL SELECT DISTINCT Statement
Overview
DISTINCT keyword in SQL eliminates all duplicate records from the result returned by the SQL query. The DISTINCT keyword is used in combination with the SELECT statement. Only unique records are returned when the DISTINCT keyword is used while fetching records from a table having multiple duplicate records.
Introduction to DISTINCT in SQL
Imagine encountering a scenario where you have to work with tables containing numerous duplicate entries, but you need to extract only the unique ones for your subsequent operations. In such a situation, the SQL DISTINCT keyword can be immensely helpful. With this keyword, you can eliminate any duplicate values and retrieve only the unique ones. The DISTINCT keyword operates on a single column, identifying and returning distinct values by removing all duplicate records from the table.
DISTINCT can also be used along with aggregate SQL functions like COUNT, MAX, SUM, AVG, etc. DISTINCT operates not only on a single column of a table but also has support for multiple columns of a table, where DISTINCT in SQL will eliminate those rows where all the selected columns are identical.
Note: In case NULL values are present in a particular table column, using the DISTINCT clause will also include NULL as a distinct record in the result.
Syntax of DISTINCT in SQL
In the above syntax, DISTINCT is used after SELECT statement followed by the name of the column of the table on which we want to apply the DISTINCT clause, which is then followed by an optional WHERE condition that includes any type of filtering needed to be done before printing out the result.
Parameters of DISTINCT in SQL
A DISTINCT clause in SQL can be applied to any valid SELECT query, where it will filter out all rows that are not unique in terms of all selected columns.
column_name: name of the column on which we want to apply the DISTINCT clause.
table_name: name of the table from which we want to retrieve the records.
WHERE condition: it is an optional statement used while writing a SQL query to satisfy the defined conditions while fetching records.
How to Use DISTINCT in SQL?
Let us consider the following Students table, containing the roll no, name, age, address, and course name in which a student is enrolled.
Roll No | Name | Age | Address | Course |
---|---|---|---|---|
101 | Nobita | 18 | Japan | Physics |
102 | Suneo | 16 | America | Aerospace |
103 | Shizuka | 18 | Japan | Chemistry |
104 | Gian | 23 | Korea | Maths |
105 | Kiteretsu | 22 | London | Geology |
106 | Kenichi | 19 | Singapore | English |
107 | Mioko | 22 | Australia | Biology |
First, write a SQL query to return all student's ages, including duplicate values.
Output
age |
---|
18 |
16 |
18 |
23 |
22 |
19 |
22 |
The above SQL query returns the age of every student, including duplicate values, i.e., 18 and 22, since 18 and 22 occur twice in the Age column of the student's table. To remove these duplicate age values, we can use DISTINCT in the SQL clause before the column name in combination with the SELECT query.
Output
age |
---|
18 |
16 |
23 |
22 |
19 |
The above SQL query returns only the unique ages from the student's table. Both duplicate values 18 and 22 are not returned because the DISTINCT clause eliminates all the duplicate values from the output.
Examples of DISTINCT in SQL
1. Example of Finding Unique Values in a Single Column
Consider the following Companies table, which contains the company name and location of their headquarters.
S No | Name | State | Country |
---|---|---|---|
1 | Microsoft | Washington | USA |
2 | BMW | Munich | Germany |
3 | Walmart | Arkansas | USA |
4 | Vodafone | London | UK |
5 | Accenture | Dublin | Ireland |
6 | Nissan | Yokohama | Japan |
7 | Ericsson | Stockholm | Sweden |
8 | Godrej | Mumbai | India |
9 | Barclays | London | UK |
10 | Nikon | Tokyo | Japan |
Writing SQL query to find unique headquarter countries from the Companies table.
Output
country |
---|
USA |
Germany |
UK |
Ireland |
Japan |
Sweden |
India |
The above SQL query uses a DISTINCT clause that filters out all duplicate country names from the output and only contains all the unique country names from the Companies table.
2. Example of Finding Unique Values in Multiple Columns
Consider the following Companies table, which contains the company names and the location of their headquarters.
S No | Name | State | Country |
---|---|---|---|
1 | Microsoft | Washington | USA |
2 | BMW | Munich | Germany |
3 | Walmart | Arkansas | USA |
4 | Vodafone | London | UK |
5 | Accenture | Dublin | Ireland |
6 | Hitachi | Tokyo | Japan |
7 | Ericsson | Stockholm | Sweden |
8 | Godrej | Mumbai | India |
9 | Barclays | London | UK |
10 | Nikon | Tokyo | Japan |
Now, write DISTINCT in SQL query to find unique state and country combinations from the Companies table.
Output
state | country |
---|---|
Washington | USA |
Munich | Germany |
Arkansas | USA |
London | UK |
Dublin | Ireland |
Tokyo | Japan |
Stockholm | Sweden |
Mumbai | India |
The above output contains all unique state and country combinations from the Companies table. Here we get only eight records as a result because (London, UK) and (Tokyo, Japan) are duplicate combinations of state and country.
3. Example of Handling NULL Using DISTINCT Clause
Consider the following Students table, which contains the roll no, names, age, and course in which a student is enrolled.
Roll No | Name | Age | Course |
---|---|---|---|
101 | Nobita | 18 | Chemistry |
102 | Suneo | 16 | NULL |
103 | Shizuka | 18 | Chemistry |
104 | Gian | 23 | Maths |
105 | Kiteretsu | 22 | Biology |
106 | Kenichi | 19 | English |
107 | Mioko | 22 | Biology |
Now, writing DISTINCT in SQL query to find unique courses.
Output
course |
---|
Chemistry |
NULL |
Maths |
Biology |
English |
In the above SQL query, the DISTINCT clause treats NULL as a value in the Course column of the table, which means that if there are two NULLs in the same column, they are interpreted as the same/duplicate value. Therefore, if the SELECT statement returns NULL multiple times, the DISTINCT will return only one NULL.
Since the DISTINCT in SQL clause doesn't ignore NULL values. Therefore, the output contains all unique course names in which students are enrolled, including NULL, as given in the Students table.
Difference between DISTINCT and GROUP BY
The DISTINCT clause in SQL filters out all duplicate records and returns unique ones. In comparison, GROUP BY is majorly used for aggregating and grouping rows. GROUP BY can also be used to filter unique records but in a little more complex manner than DISTINCT in SQL.
Consider the following Student table having duplicate records.
Roll No | Name | Age | Address | Course |
---|---|---|---|---|
101 | Nobita | 18 | Japan | Physics |
102 | Suneo | 16 | America | Aerospace |
103 | Shizuka | 18 | Japan | Chemistry |
104 | Gian | 23 | Korea | Maths |
105 | Kiteretsu | 22 | London | Geology |
106 | Kenichi | 19 | America | English |
107 | Mioko | 22 | Australia | Biology |
Using DISTINCT in SQL to get unique addresses from the table.
Output
Address |
---|
Japan |
America |
Korea |
London |
Australia |
The above SQL query uses the DISTINCT clause on the Address column of the table to return only the unique Address values by eliminating the duplicate ones from the Students table.
Now use GROUP BY to produce the same output along with the count of each address in the table.
Output
Address | address_count |
---|---|
Japan | 2 |
America | 2 |
Korea | 1 |
London | 1 |
Australia | 1 |
In the above SQL query, we have grouped our output by the Address column that only returns the unique Address values and the number of times it exists in the Students table.
Conclusion
- DISTINCT keyword in SQL is used in conjunction with the SELECT statement. Unique records are returned when the DISTINCT keyword is used while fetching records from a table having multiple duplicate records.
- DISTINCT in SQL operates only on a single column. It does not have support for multiple columns.
- GROUP BY is also used for fetching unique records, but the main difference between DISTINCT and GROUP BY is that the latter is used for aggregating and grouping rows that help summarize a particular column of a table.
- For example, the DISTINCT clause in SQL is widely used in a School Management Database to find out the unique city names of students in the school.