Correlated Subquery
Overview
A subquery is a query that is enclosed within another query in SQL. When a subquery returns data, this data is used by the outer query.
In other words, this data is a prerequisite for the outer query to work. In this case, instead of writing two separate queries, we write one inside of the other. However, a correlated subquery needs the values of the main query to work. How? Let's find out!
What is A Correlated Subquery?
A correlated subquery uses the values of the outer or main query. This query is dependent on the outer one and hence cannot be executed in an independent manner. The main query can be in the form of the following:
- SELECT
- UPDATE
- DELETE
A correlated subquery is used in row-by-row processing and gets evaluated once for each row processed by the outer query. A correlated subquery is a way to read all the rows in a table. It is used when a subquery is expected to return different values for each row the main query looks at, so it reads all of them. Since it gets executed repeatedly, it is also known as repeating subquery.
We shall cover examples involving correlated subqueries in the sections ahead.
Example
Let us consider a table vehicles from some database that lists the most preferred car models in a city. The table columns are:
Cars |
---|
car_model |
car_name |
brand |
category |
market_price |
We will write a query to find all the cars whose market price equals the lowest price in the same category.
Following is the MySQL Code :
MySQL:
Output:
car_name | market_price | category |
---|---|---|
A | $18000 | Hatchback |
X | $18000 | Hatchback |
ABC | $20000 | Sedan |
XYZ | $20000 | Sedan |
Z | $20000 | Sedan |
I | $24000 | SUV |
Explanation: In the example above, we have a SELECT statement within a SELECT statement. For every car evaluated by the outer or main query, the lowest price in its category is found with the help of the subquery.
In case the price of any car is equal to the least price of all cars in the same category, it is added to the result. Thus, the correlated subquery is executed for every row evaluated by the outer query.
Nested Subqueries Vs Correlated Subqueries
In a nested query, the inner query executes only once and runs first. The outer query uses the data returned by the inner query.
On the other hand, the correlated subqueries execute after the outer query and use values obtained from it, as discussed above. Let us look at some examples to understand better.
We have tables employees and teams from a sample database.
Employees:
employees |
---|
employee_id |
name |
age |
team_id |
salary |
Teams:
teams |
---|
team_id |
name |
strength |
For Nested Subqueries :
We shall write a query to find all the employees from the " Finance " team.
Output:
name | employee_id | team_id |
---|---|---|
James | 1234 | 2 |
Jack | 7483 | 2 |
Mary | 7980 | 2 |
Anna | 5476 | 2 |
Dan | 8392 | 2 |
So in the above query, the inner query returns the id for the team "Finance" and the outer query returns the "employees" who have the same "team_id".
As you can see, the inner query can also be executed as a standalone query.
team_id |
---|
2 |
"team_id" for the team "Finance" is returned.
For Correlated Subquery:
We shall write a query to find all the employees having salaries greater than the average salary in their team.
Output:
name | employee_id | salary | team_id |
---|---|---|---|
Emma | 3636 | $11700 | 1 |
George | 1284 | $11900 | 1 |
Jack | 7483 | $10000 | 2 |
Anna | 5476 | $10500 | 2 |
Chris | 3897 | $12000 | 3 |
Jennifer | 2736 | $9000 | 4 |
Eric | 9876 | $9850 | 4 |
Here, the inner subquery is dependent on the outer for values, hence can not be executed alone. Row-by-row processing takes place here.
Using EXISTS with a Correlated Subquery
The EXISTS keyword in SQL is used to check the existence of a particular record in a subquery. If the subquery returns a record, then the EXISTS keyword returns TRUE. Following is the syntax of a subquery while using EXISTS keyword:
Syntax:
Let us look at an example to understand better. Let us consider two tables, "BLOGS" and "AUTHORS" from some sample databases.
AUTHORS |
---|
Author_ID |
Author_Name |
Contact_Address |
Total_Blogs |
BLOGS |
---|
Blog_ID |
Blog_Name |
Author_ID |
Total_Views |
We shall write a query to find the authors who have written blogs with more than 20,000 views.
So as you see, the inner query is dependent on the outer query for its values. The inner query returns TRUE or FALSE according to the condition applied.
Using NOT EXISTS With A Correlated Subquery
NOT EXISTS is exactly the opposite of the EXISTS keyword. It checks for the existence of particular records in the subquery. If the record is not present, it returns TRUE; otherwise FALSE.
Following is the syntax of a subquery while using the NOT EXISTS keyword :
Syntax:
The following example shall help you understand better:
In a section before, we considered an example to find the authors who have written blogs with more than 20,000 views. Let us see how the query gets modified using NOT EXISTS.
AUTHORS |
---|
Author_ID |
Author_Name |
Contact_Address |
Total_Blogs |
BLOGS |
---|
Blog_ID |
Blog_Name |
Author_ID |
Total_Views |
So as you can see, the condition Total_Views < 20000 is exactly the opposite to get the same results as EXISTS using NOT EXISTS.
Correlated Vs Uncorrelated Subqueries In SQL
Let us see how correlated and uncorrelated subqueries differ :
Correlated Subqueries | Uncorrelated subqueries |
---|---|
A correlated subquery is dependent on the outer query for its values. | An uncorrelated subquery is not dependent on the outer query for its values |
It can not be executed as a standalone query. | It is possible to execute an uncorrelated subquery as a standalone query. |
It gets executed multiple times, also called repeating subquery. | It gets executed only once. |
Since it gets executed for every row, thus slower in performance | An uncorrelated subquery is faster than the correlated subquery |
Most commonly, these subqueries are used with keywords exits and not exists | Mainly used with IN and NOT INclauses. |
Conclusion
- A correlated subquery is dependent on the outer query for its values.
- It is mainly used in the row-by-row processing of tables.
- The most commonly used keywords with correlated subqueries are EXISTS and NOT EXISTS.