Thursday, February 25, 2010

Delete duplicate records from large data table.

Delete duplicate records from table:

For completing this task we need to do three steps:
Step 1: Identify what are all the duplicate records for our criteria
Step 2: Move all those duplicate records in to some temporary table.
Step 3: Let write some sql script for delete.

Step 1:

Identify what are all the duplicate records for our criteria:

Following query gives duplicate records list and it will move to
some temporary table

-- Script by suresh
select First_name, Last_name, city, state, count(*) as duplicate_count
into #dupes -- dupes is duplicate table name step 2
from dup_authors
group by First_name, Last_name, city, state
having count(*) > 1
order by count(*) desc, First_name, Last_name


In the first step we found duplicate records like

First_name Last_name city state duplicate_count
seep jubly Idaho IH 10

By observing this we found 10 resords for the above criteria. Then the question arises how can we delete 9 records by keeping single record in original table.

Step 3:

Write a simple cursor for deletion by using 'top' clause in deletion

SQL is:
CREATE Procedure [dbo].[CUR_DeleteDuplicateRecords_Final]

AS
BEGIN
DECLARE @Profile_name varchar(MAX)
DECLARE @Profile_Phone1 varchar(100)
DECLARE @Profile_StreetAddress1 varchar(MAX)
DECLARE @Profile_State varchar(300)
DECLARE @Duplicate_Rows_Count INT

DECLARE DuplicateRecords_Cursor CURSOR FAST_FORWARD FOR
SELECT Profile_Name, Profile_Phone1, Profile_Streetaddress1 ,Profile_State, Duplicate_Rows_Count
FROM
dups

OPEN DuplicateRecords_Cursor

FETCH NEXT FROM DuplicateRecords_Cursor INTO
Profile_name, @Profile_Phone1, @Profile_StreetAddress1,
@Profile_State,@Duplicate_Rows_Count

WHILE @@FETCH_STATUS = 0
BEGIN

--Start Delete Querry

DELETE TOP (@Duplicate_Rows_Count-1)
FROM dbo.T_Business_Profiles
WHERE Profile_name=@Profile_name and
Profile_phone1=@Profile_Phone1 and
Profile_StreetAddress1=@Profile_StreetAddress1 and
Profile_State= @Profile_State

--End Delete Querry

FETCH NEXT FROM DuplicateRecords_Cursor INTO @Profile_name, @Profile_Phone1, @Profile_StreetAddress1,
@Profile_State,@Duplicate_Rows_Count
END

CLOSE DuplicateRecords_Cursor
DEALLOCATE DuplicateRecords_Cursor

END

No comments:

Post a Comment