Sunday, April 14, 2024

6 Examples of Correlated Subqueries in SQL

The correlated subquery is one of the interesting but tricky concept to master in SQL. It's like recursion in computer science, which many developer struggle to understand and apply, but it's a such nice technique that I believe every programmer should learn and master. It took me long time to understand and master this technique and solving SQL queries which I am going to share with you today, played a huge role in learning correlated subqueries better. Sometime, you can solve a problem easily by using correlated subquery like the Nth highest salary problem, which is quite easy to solve using correlated subquery but difficult otherwise.


The key here is to try to solve the problem by yourself first so that you give your mind enough chance to find the solution, when you do that you will more likely to understand the solution better and also retain the knowledge .


6 Examples of Correlated Subqueries in SQL

Now, let's jump into examples and common SQL problems which can be solved using correlated subquery. Many of them are also popular SQL queries which are asked on interviews like the 2nd height salary or Nth highest salary

3 Examples of Correlated Subqueries in SQL


1. 2nd highest salary in SQL

This is probably the best example of correlated subquery and one of the frequently asked SQL interview questions as well. Here the challenge is to find the employee with the second highest salary not the highest one which is easier to find using MAX function. 

First but let's create the table and insert data for testing:

CREATE TABLE #Employee (Name VARCHAR(22), Department VARCHAR(22), Salary BIGINT);

INSERT INTO #Employee values ('John', 'Sales', 1000);

INSERT INTO #Employee values ('Root', 'Sales', 2000);

INSERT INTO #Employee values ('Cook', 'Sales', 3000);

INSERT INTO #Employee values ('Kevin', 'Finance', 2200);

INSERT INTO #Employee values ('Johnson', 'Finance', 2900);

INSERT INTO #Employee values ('Smith', 'Finance', 3000);

INSERT INTO #Employee values ('Marsh', 'IT', 1100);

INSERT INTO #Employee values ('Garry', 'IT', 2100);

INSERT INTO #Employee values ('Tuffnel', 'IT', 1900);

Looking at the data, you can see that highest salary is 3000 and second highest salary is 2900 and employee is Johnson, let's see if our query return that. 

And ,here is the correlated SQL query to find the second highest salary from the #Employee table, I have purposefully made it a temp table, so that it can be cleared once you close the connection. 

SELECT e.Salary FROM #Employee e 
WHERE 2 = (SELECT COUNT(DISTINCT Salary) FROM #Employee p 
           WHERE p.Salary >= e.Salary)


In this case the correlated part of subquery is executed for each row executed by outer query. For example, if first employee has salary 3000 then correlated salary will become like below:

SELECT COUNT(Salary) FROM #Employee p WHERE p.Salary >= 3000

and it will find how many salaries are greater than or equal to 3000, if there are only two salaries which are greater than 3000 then it becomes the second highest salary.

Note, the use of distinct keyword here since we have a duplicate salary, I mean two people have 3000 as their salary, hence I have used distinct keyword to return 2900. If you don't use distinct then it will return 3000 because there are two people with the same 3000 salary.




2. Nth highest salary in SQL

This is the extension of previous example of correlated subquery in SQL because you can replace "2" with any number to find the 3rd highest, 4th highest, 5th highest or Nth highest and lowest salary. 

The concept is that outer query will print the salary for which there are N salaries are higher or lower, which will be found by correlated subquery. 

Here is the SQL query, you just need to replace N with the number you want. 

SELECT e.Salary 

FROM #Employee e 

WHERE N = (

SELECT COUNT(DISTINCT Salary) 

FROM #Employee p 

WHERE p.Salary >= e.Salary

)

let's try to calculate the third highest salary using this query. If you look at the above data, you will notice that third highest salary is 2200 so, let's replace N=3 and run the query and see what does it return. 

I am running this query on SQLfiddle.com, an online tool which allows you to run SQL query online with all popular database like MySQL, Oracle, SQL Server, PostgreSQL etc. I have chosen SQL Server as database and here is the output


You can see the result is 2200, which is expected. If you need the 4th highest salary, then just replace the N with 4 and run  it again. If our query is correct then it should return 2100 which is the 4th highest salary in our table. 


You can see that its correctly output "2100" as fourth highest salary which is expected. Similarly you can calculate 5th and 6th highest salary. 


3. Print all employees whose salary is greater than average salary in their department

Now, we will see another example where correlated subquery can help. Here you need to find out all employees who is making more than the average salary of their department. 

SELECT e.Name, e.Salary, e.Department

FROM #Employee e

WHERE Salary > (

SELECT AVG(Salary) as Avg_Salary

FROM #Employee p

WHERE e.Department = p.Department);

 This is another interesting example of correlated subquery in SQL the query looks very much similar as we are again comparing salary from inner and outer query. This time, the inner query is calculating the average salary of employee and comparing with salary of outer query to print all employees who earn higher than average salary in their department. 

If you look at the data, we have three department Sales, Finance, and IT and their average salaries are 2000, 2700, and 1700 respectively. So finance have highest average salary and IT has lowest average salary which is pretty much same as real world data. 

As per out data, Garry and Tuffle earn higher than average in IT department, and Cook earn higher than average in Sales, let's see if our query return the correct result or not. 


You can see the result returned by our query is correct and match with the expectation. 



4. SQL correlated subquery in SELECT clause example

You are not just limited to use the corelated sub query on WHERE clause you can also use the correlated subquery in SELECT clause as shown in following example. 

SELECT

employee_number,

name,

(SELECT AVG(salary) 

FROM employees

WHERE department = Bob.department) AS department_average

FROM employees AS Bob;

This may not look a corelated sub query to many of you but it is a correlated sub query because the inner query uses table from outer query "Bob". It references Bob.department where Bob is the alias for outer query table. 

This means for each row processed by the outer query, the inner query is executed and the value of department from the outer query is used to filter the records for calculating average salary by inner query. 


5. SQL correlated subquery in WHERE clause example

Now, let's see another example of using correlated subquery in the WHERE clause :

SELECT employee_number, name

FROM employees AS Bob

WHERE salary > (

SELECT AVG(salary)

FROM employees

WHERE department = Bob.department);

This query is similar to previous query and is an example of correlated query because inner query again use the table from outer query and for each row processed by outer query, the inner query used the department value to filter departments for calculating average salary. 



6. SQL correlated subquery in HAVING clause example

Now that we have seen the use of correlated subquery in both SELECT and WHERE clause, let's see an example of correlated subquery in HAVING clause

SELECT 

t1.categoryID, categoryName

FROM

products t1

INNER JOIN

categories c ON c.categoryID = t1.categoryID

GROUP BY categoryID

HAVING MAX(unitprice) > ALL (

SELECT 2 * AVG(unitprice)

FROM

products t2

WHERE

t1.categoryID = t2.categoryID)

You can see that in the HAVING Clause we have another query, a subquery which also uses the value of table t1 from outer query which makes it a correlated subquery. 

The effect of correlated subqueries in some cases can be obtained using joins. For example, the queries above (which use inefficient correlated subqueries) may be rewritten as follows.

-- This subquery is not correlated with the outer query, and is therefore

-- executed only once, regardless of the number of employees.

SELECT employees.employee_number, employees.name

FROM employees INNER JOIN

(SELECT department, AVG(salary) AS department_average

FROM employees

GROUP BY department) AS temp ON employees.department = temp.department

WHERE employees.salary > temp.department_average;


That's all about interesting examples of correlated subqueries in SQL. These examples will help you to learn and understand how correlated subquery executes and works. It's a powerful technique to solve problems and every programmer should aware of this. Though, you have to be mindful of size of table because correlated subquery can be really slow for large table and can lock the table from long time, potentially blocking other jobs and queries from completing. So, avoid running on correlated subqueries on production during market hours.


Other SQL Tutorials and Articles you may like


Thank you for reading this article so far. If you have any questions or doubt then feel free to ask. 

No comments:

Post a Comment