Sunday, August 4, 2013

Delete duplicate records from table

1.
SELECT DISTINCT * INTO tab2 FROM tab1 — distinct of all columns
DROP TABLE tab1
EXEC sp_rename 'tab2','tab1'

2.
WITH T1 AS (SELECT ROW_NUMBER ( ) OVER ( PARTITION BY ID, FNAME, LNAME ORDER BY ID ) AS RNUM FROM DUPLICATE )
DELETE FROM T1 WHERE RNUM > 1


SQL query to delete duplicate rows


Create a table EmpDtl1 with some duplicate rows as shown below to understand different methods of delete duplicate rows.
    create table EmpDup(empid int,name varchar(20))
    
    insert into EmpDup values(1,'Andy')
    insert into EmpDup values (1,'Andy')
    insert into EmpDup values(2,'Bill')
    insert into EmpDup values(2,'Bill')
    insert into EmpDup values (2,'Bill')
    insert into EmpDup values (3,'Chris')
    
User Name "Andy" repeated 2 times and User Name "Bill" repeated 3 times.
Following are the different methods for deleting duplicate rows.
Method 1:
Insert the distinct rows from the duplicate rows table to new temporary table. Delete data from table which has duplicate rows then insert the distinct rows from the temporary table as shown below.
    select distinct * into #tmp From EmpDup
    delete from EmpDup
    insert into EmpDup                
    select * from #tmp drop table #tmp
    
Method 2:
If you want to consider only few columns in a table for duplication criteria to delete rows then Method 1 will not work(in our example, if EMDup table has more than 2 columns and delete rows if empid and name repeats more than one time).
In this case, Add an identity column as a serial number that acts as a row unique identifier(auto incremental ascending order).Then get the Rank against each empid,name. If Rank is greater than 1 means it is a duplicate row and delete the same. After deleting the duplicated rows, remove the identity column which is used for rank. See the below example.
    alter table EmpDup add  sno int identity(1,1)

    delete E
    from  EmpDup E
    inner join
    (select *,
    RANK() OVER ( PARTITION BY empid,name ORDER BY sno DESC )rank
    From EmpDup )T on E.sno=t.sno
    where T.Rank>1

    alter table EmpDup 
    drop  column sno
    
If you feel this query is little difficult to understand then use following same query but in different way!
    alter table EmpDup add  sno int identity(1,1)
    delete from EmpDup where sno in
    (
    select sno from (
    select *,
    RANK() OVER ( PARTITION BY empid,name ORDER BY sno DESC )rank
    From EmpDup
    )T
    where rank>1
    )

    alter table EmpDup 
    drop  column sno
    
If your sql server version is below SQL2005 then above queries (above 2 queries in Method2) won't work since Rank() command is not available in SQL 2000 and below versions.
If your sql server version is below SQL2005 then use below query. Same logic to get the rank but without using the Rank() command!
    alter table EmpDup add  sno int identity(1,1)
    delete from EmpDup where sno in
    (
    select sno  from EmpDup D where 
    1<(select count(*) from EmpDup A where A.empid=D.empid and A.name=D.name and D.sno>=A.sno)
    )

    alter table EmpDup
    drop  column sno
    
Method 3:
Using "Delete Top( )" clause:
If you want to delete duplicate rows for a particular empid then use "Top()" command in delete query as shown below.
    delete top(2) From EmpDup where empid=2
OR
    delete top(select count(*)-1 From EmpDup x where x.empid=2) From EmpDup where empid=2
    
Method 4:
If you want to delete all the rows if the selected columns repeated more than 1 time then use below query.
Query to delete 3 duplicated rows (in our example table) or repeated more than 1 time.
    delete from EmpDup where EmpID in(select EmpID from EmpDup group by EmpId having
    count(*) >1)
    
http://www.besttechtools.com/articles/article/sql-query-to-delete-duplicate-rows 


SQL Server: How To Find Duplicate Records

When checking the integrity of your data, it may be necessary to check your tables for duplicate records. By grouping these records, you can eliminate the unique records from your result set so that you can view just the records that contain duplicated values, making it easier to remove them.
In our example, we will work with a table named Students and we will look for duplicate values in theEmail_Address column.
By using the following SELECT statement, you can isolate the records with duplicate values:
SELECT Email_Address, COUNT(*) FROM Students Group BY Email_Address HAVING COUNT(*) > 1
You can use the opposite of this to find all rows with unique values in the Email_Address column by using the following SELECT statement:
SELECT Email_Address, COUNT(*) From Students GROUP BY Email_Address HAVING COUNT(*) = 1
There are various times when we need to find duplicate records in SQL Server. It is possible to find duplicates using DISTINCT, ROW NUMBER as well as the GROUP BY approach.
Duplicate records can create problems sometimes when displaying reports or performing a Multiple Insert update. Finding duplicate records in a database needs further investigation. In some cases, duplicate records are positive, but it all depends on the data and the database design as well.
For example, if a customer has ordered the same product twice on the same date with the the same shipping and billing address, then this may result in a duplicate record.
Let us create a table Customer with First Name, Last Name, and Mobile Number fields.
CREATE TABLE CUSTOMER
(
FirstName VARCHAR(50),
LastName  VARCHAR(50),
MobileNo  VARCHAR(15)
);

INSERT INTO CUSTOMER VALUES ('Niraj','Yadav',989898);

INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);
INSERT INTO CUSTOMER VALUES ('Chetan','Gadodia',959595);

INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);
INSERT INTO CUSTOMER VALUES ('Atul','Kokam',42424242);

INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);
INSERT INTO CUSTOMER VALUES ('Vishal','Parte',9394453);

INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);
INSERT INTO CUSTOMER VALUES ('Jinendra','Jain',121212);

SELECT * FROM CUSTOMER;
Finding Duplicates Records Using Group by in SQL Server (2)
Using the DISTINCT approach, we can quickly get unique rows in a table.
SELECT DISTINCT FirstName, LastName, MobileNo FROM CUSTOMER;
Finding Duplicates Records Using Group by in SQL Server (3)
However, this does not show how many times a row has been duplicated. Using the GROUP BY approach, we can find this.
Finding Duplicates Using GROUP BY
Adding grouping and a counting field to our display of FirstName, LastName and MobileNo combination shows how many times each customer’s name appears.
SELECT  FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM  CUSTOMER
GROUP BY FirstName, LastName, MobileNo;
Finding Duplicates Records Using Group by in SQL Server (4)
GROUP BY will show just one record for each combination of FirstName, LastName and MobileNo.
The count CNT shows how many times the row has been duplicated.
CNT = 1 indicates that row appears only once.
Let us filter out using the Having clause to exclude rows that appear only once.
SELECT  FirstName, LastName, MobileNo, COUNT(1) as CNT
FROM  CUSTOMER
GROUP BY FirstName, LastName, MobileNo
HAVING  COUNT(1) > 1;
Finding Duplicates Records Using Group by in SQL Server (1)

How to Remove duplicate entry from a Table

It's a quite simple SQL manipulation for dulplicate entry delitions. Some how I am noted it down for a quick references.
-- Step-1 [ Create a Table ]
CREATE TABLE AA
            (DECIMAL,
             y VARCHAR)
-- Step-2 [ Insert Some values, it must contains duplicate values ]
INSERT INTO AA
VALUES(1,'A'),(2,'B'),(3,'C'),(1,'A'),(1,'A')
-- Step-3
SELECT x, y, col3=count(*)
INTO midtable
FROM AA
GROUP BY x, y
HAVING count(*) > 1
--Step-4
SELECT DISTINCT AA.*
INTO midtable1
FROM AA, midtable
WHERE AA.= midtable.x
AND AA.= midtable.y
--Step-5
SELECT x, y, count(*)
FROM midtable1
GROUP BY x, y
--Step-6
DELETE AA
FROM AA, midtable
WHERE AA.= midtable.x
AND AA.= midtable.y
--Step-7
INSERT AA SELECT * FROM midtable1
--Step-8 [ Final Output ]
SELECT * FROM AA