How to remove duplicate rows from a table --- SQL SERVER

{ Posted on 10/25/2009 11:54:00 AM by Silvercode }
Sometimes there are situations, where we'll have to delete the duplicate records in a table.
CREATE TABLE Users
(
FirstName nvarchar(50),
LastName nvarchar(50)
)
GO

INSERT INTO users (Fname, Lname) VALUES (N'Tarana',N'Abbas')
INSERT INTO users (Fname, Lname) VALUES (N'Jane',N'Eyre')
INSERT INTO users (Fname, Lname) VALUES (N'Julia',N'Fonda')
INSERT INTO users (Fname, Lname) VALUES (N'Jane',N'Eyre')
Here "Jane Eyre" is a duplicate record.

Identify which are the rows duplicating:
SELECT Fname, Lname, count(*)
FROM users
GROUP BY Fname, Lname
HAVING count(*) > 1

Methods to delete the duplate records:
Method1:
SET ROWCOUNT

SET ROWCOUNT 2
DELETE FROM Users WHERE FirstName = N'Jane' AND LastName = N'Eyre'
SET ROWCOUNT 0
(1 row(s) affected)
Above is incase of manual deletion of the row. Say incase of automatic deletion, we'll have to open the cursor , fetch values from the table and delete one by one by iterating.
Note: while you are declaring the cursor for dublicate rows, you should select the count of identical rows minus one, since we want one of the duplicated records to be existing in the table after delete processes.
ie:

DECLARE @Count int
DECLARE @fname nvarchar(50)
DECLARE @lname nvarchar(50)

DECLARE duplicate_cursor CURSOR FAST_FORWARD FOR
SELECT fname, lname, Count(*) - 1
FROM Users
GROUP BY fname, lname
HAVING Count(*) > 1

OPEN duplicate_cursor

FETCH NEXT FROM duplicate_cursor INTO @fname, @lname, @Count

WHILE @@FETCH_STATUS = 0
BEGIN

SET ROWCOUNT @Count
DELETE FROM Users WHERE fname = @Fname AND lname = @Lname
SET ROWCOUNT 0

FETCH NEXT FROM duplicate_cursor INTO @Fname , @Lname, @Count
END

CLOSE duplicate_cursor
DEALLOCATE duplicate_cursor
Note:***** Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server.
Method 2:
TOP

DELETE TOP(@Count) FROM Users WHERE FirstName = @fname AND LastName = @lname
This statement can be used inside the cursor, for deleting the values atonce.

Method 3:
Adding Identity column:
ALTER TABLE users ADD Id int IDENTITY(1,1)
and then use direct delete statement using id value
DELETE FROM users WHERE Id IN (2,4)

ALternatively we can use Common Table Expression:
WITH Duplicates_CTE(Fname, Lname, Id)
AS
(
SELECT Fname, Lname, Min(Id) Id
FROM users
GROUP BY Fname, LastName
HAVING Count(*) > 1
)
DELETE FROM users
WHERE Id IN (
SELECT users.Id
FROM Users
INNER JOIN Dublicates_CTE
ON users.Fname= Dublicates_CTE.Fname
AND users.Lname= Dublicates_CTE.Lname
AND users.Id <> Dublicates_CTE.Id
)
Method 4:
Say tht you want to keep according to the inserted date first, then use ROW_NUMBER() OVER (PARTITION BY columnname1, columnname2 ORDER BY columnname3 DESC) syntax
WITH [CTE DUPLICATE] AS
(
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY name ORDER BY addedDate DESC),
Id, Fname, Lname, addedDate
FROM users
)
DELETE FROM [CTE DUPLICATE] WHERE RN > 1

Incase you dont know what is CTE ie Common Table Expression, please look forward for my next article regarding the same.

No Response to "How to remove duplicate rows from a table --- SQL SERVER"

Post a Comment