Tuesday, May 7, 2024

How to Remove duplicate rows from table in SQL Server using temp table? Example

One of the easiest way to remove duplicate rows from a table in SQL is copy unique data into temp table, delete the original table and create a new table from the temp table without any duplicates. Sometime when you don't have a unique key or primary key in table, you end up with duplicate rows i.e. more than one rows where all columns are identical. You can easily find those duplicate rows by using GRPUP BY, as seen in our previous SQL interview questions. You can also print just unique rows by using distinct clause, but how do you go about removing duplicate rows from table altogether? That's what you will learn in this article. Actually there are multiple ways to remove duplicates and you will learn couple of them in this article. 

This is also a follow-up SQL interview question from, how to find duplicates in a given table in SQL which we have seen earlier. 

Even though, we will use Microsoft SQL Server database in our example, most of the solution will run just fine on other database like Oracle and MySQL. Some might require little bit of tweaking but until you know the steps, you can rewrite the SQL query for your database.

In this article, we will see 2 ways to remove duplicate rows from a given table in SQL
1. By using temp table and SELECT INTO command
2. By using ROW_NUMBER clause

How to Remove duplicate rows from table in SQL Server using temp table? Example


Removing duplicate rows in SQL Server using temp table

This is the easiest solution for small tables but also works fine for large tables. This solution has three steps :

1) copy unique rows into temp table

2) delete or truncate the table with duplicate rows

3) copy unique rows from temp to original table



Here is the SQL query to remove duplicates from a table in Microsoft SQL SERVER :

select distinct * into #Uniques from #Letters

delete #Letters

insert into #Letters select * from #Uniques

You can also use truncate in place of delete, because we are actually clearing the table. 

Remember, we are not changing the structure of data it just data, which we are copying back and forth. I have also tested this code in  Microsoft SQL Server using SQL Server Management Studio, my favorite SQL IDE, and it works fine, but given its not using any special SQL Sever commands or features, I think it will also work in Oracle, MySQL and PostgreSQL.

If you want, you can also test this in online SQL practice sites like sqlfiddle.com, in fact I will show you how to test that soon as we have complete SQL script with us which you can copy and paste in your database to setup the table with data and start testing yourself. 


Here is the complete example to demonstrate this problem and solution :
IF OBJECT_ID( 'tempdb..#Letters' ) IS NOT NULL

DROP TABLE #Letters;

CREATE TABLE #Letters (letter varchar(1), position int);


INSERT INTO #Letters VALUES ('A', 1);

INSERT INTO #Letters VALUES ('A', 1); -- duplicate

INSERT INTO #Letters VALUES ('A', 1); -- duplicate

INSERT INTO #Letters VALUES ('A', 2);

INSERT INTO #Letters VALUES ('B', 1);

INSERT INTO #Letters VALUES ('B', 2);

INSERT INTO #Letters VALUES ('B', 2); -- duplicate

INSERT INTO #Letters VALUES ('C', 1);

INSERT INTO #Letters VALUES ('C', 2);


SELECT * FROM #Letters ORDER BY letter, position;


-- remove duplicate rows if all fields are identical

select distinct * into #Uniques from #Letters

delete #Letters

insert into #Letters select * from #Uniques


SELECT * FROM #Letters ORDER BY letter, position;           


Now, let's just copy and paste this code in Sqlfiddle.com and find out whether it work or not in both SQL Server and other databases.  If you look closely the #Letters is a temp table I have created for testing and it contains three duplicates, three rows with letter A and two rows with letter B. 

We are copy and pasting unique rows into another temp table #Uniques and then deleting all records from #Letters and then copying the unique row back to #Letters

Which means if we run the SELECT command again on #Letters, we should not see any duplicates:

Here is the output of above script when I ran it on SQL server database on SQLFiddle.com

letter position
------ -----------
A 1
A 1
A 1
A 2
B 1
B 2
B 2
C 1
C 2
letter position
------ -----------
A 1
A 2
B 1
B 2
C 1
C 2

You can clearly see that first output has duplicate rows but second output doesn't have any duplicate rows which means we have cleaned them up.

Here is the screenshot which shows that we are using SQL Server database:

How to Remove duplicate rows from table in SQL Server using temp table? Example



Now, let's run the same SQL command into MySQL database to see if it can remove duplicates on a MySQL table or not:

It seems the code doesn't work in MySQL as we are getting below error:

ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IF OBJECT_ID( 'tempdb..#Letters' ) IS NOT NULL DROP TABLE CREATE TABLE INS' at line 1 

It's because we have used T-SQL code which is not supported in MySQL, let's try without that as that code is only checking for duplicate table. 

Now we are getting another error:

ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'INSERT INTO INSERT INTO INSERT INTO INSERT INTO INSERT INTO INSERT INT' at line 3 

It seems MySQL doesn't support SELECT INTO command and its giving error on following code

select distinct * into #Uniques from #Letter
So, its not working, no problem, we will solve that into another post, let's see another way to remove duplicate rows into SQL Server database


Solution 2 - removing duplicates using ROW_NUMBER() and common table expression

You can also remove duplicates from a table by using following SQL query which uses common table expression :

;WITH cte

AS (SELECT ROW_NUMBER() OVER (PARTITION BY letter, position

ORDER BY ( SELECT 0)) RN

FROM #Letters)

DELETE FROM cte

WHERE RN > 1


Here we are using window function ROW_NUMBER(), to assign row number to each record. Since we are using PARTITION BY letter, position the row number will reset as soon as any of them will change. 

For example, in our table we have these three records which are exactly same, only one of them will be chosen and will be assigned RN = 1 , as shown below :

SELECT *, ROW_NUMBER() OVER (PARTITION BY letter, position

ORDER BY ( SELECT 0)) RN

FROM #Letters


letter position RN

A 1 1

A 2 1

B 1 1

B 2 1

C 1 1

C 2 1


It's mandatory to use ORDER BY clause with a ranking function like ROW_NUMBER() in SQL Server . 

Here I am using ORDER BY (SELECT 0) as you can preserver any row in the event of a tie. If you want to preserve latest one then you can also do something like ORDER BY position desc.

The beauty of this solution is that it will work just fine for large tables as well, but this will not work in MySQL, Oracle or PostgreSQL database. Sybase may be but I haven't tested on it.


That's all about how to remove duplicate rows from a table in SQL SERVER. As I told you, the SQL query is tested on SQL Server but most likely work in Oracle and MySQL as well. If it doesn't just let me know and I'll work with you to fix that. 

Just remember the steps, copy the unique data to temp table, truncate original table and then copy the unique data back to the original table. 

Don't forget to drop the temp table, or if you want to keep until next run, just leave it.

1 comment:

  1. If you want to practice the SQL, you can also use this link on SQLFiddle where my queries are saved https://sqlfiddle.com/mysql/online-compiler?id=d33af95f-4ae9-423d-8164-c8a8ceaad045

    ReplyDelete