Removing duplicate rows from table in Oracle

Question

I'm testing something in Oracle and populated a table with some sample data, but in the process I accidentally loaded duplicate records, so now I can't create a primary key using some of the columns.

How can I delete all duplicate rows and leave only one of them?

Bill the Lizard · Accepted Answer · 2014-06-12 14:13:28Z

421

Use the rowid pseudocolumn.

DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);

Where column1, column2, and column3 make up the identifying key for each record. You might list all your columns.

edited Jun 12, 2014 at 14:13

answered Feb 9, 2009 at 17:41

Bill the Lizard

403k210 gold badges570 silver badges886 bronze badges

9

+1 I had to find two duplicate phone numbers buried in 12,000+ records. Changed the DELETE to SELECT and this found them in seconds. Saved me a ton of time, thank you.
– shimonyk
Commented Sep 23, 2010 at 15:30
3

This approach did not work for me. I don't know why. When I replaced "DELETE" with "SELECT *", it returned the rows I wanted to delete, but when I executed with "DELETE" it was just hanging indefinitely.
– aro_biz
Commented Jun 25, 2012 at 12:05
1

Mine is also either hanging or just executing extremely long. Been running for about 22 hours and still going. Table have 21M records.
– Cameron Castillo
Commented Aug 22, 2013 at 5:57
1

I suggest to add further filtering to the WHERE statement if you have a very large data set and if feasible, this might help folks with long running queries.
– Ricardo Sanchez
Commented Apr 8, 2014 at 16:58
3

If the select works, but the delete does not, that might be due to the size of the resulting subquery. It might be interesting to first do a create table with the subquery result, build an index on the min(rowid) column, and then run the delete statement.
– Wouter
Commented May 15, 2014 at 13:51

| Show 2 more comments

David Balažic · Accepted Answer · 2015-11-05 14:13:02Z

19

From Ask Tom

delete from t
 where rowid IN ( select rid
                    from (select rowid rid, 
                                 row_number() over (partition by 
                         companyid, agentid, class , status, terminationdate
                                   order by rowid) rn
                            from t)
                   where rn <> 1);

(fixed the missing parenthesis)

edited Nov 5, 2015 at 14:13

David Balažic

1,4541 gold badge24 silver badges51 bronze badges

answered Mar 18, 2011 at 6:11

Dead Programmer

12.5k23 gold badges81 silver badges113 bronze badges

Add a comment |

Mark · Accepted Answer · 2009-02-09 17:43:52Z

17

From DevX.com:

DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3...) ;

Where column1, column2, etc. is the key you want to use.

answered Feb 9, 2009 at 17:43

Mark

4683 silver badges7 bronze badges

Add a comment |

user187624user187624 · Accepted Answer · 2009-11-09 06:18:36Z

15

DELETE FROM tablename a
      WHERE a.ROWID > ANY (SELECT b.ROWID
                             FROM tablename b
                            WHERE a.fieldname = b.fieldname
                              AND a.fieldname2 = b.fieldname2)

answered Nov 9, 2009 at 6:18

user187624

1

Re my comment above on the top-voted answer, it was this request which actually solved my problem.
– aro_biz
Commented Jun 25, 2012 at 12:06
4

This will be -a lot- slower on huge tables than Bill's solution.
– Wouter
Commented May 15, 2014 at 14:01

Add a comment |

huzeyfe · Accepted Answer · 2015-12-31 11:29:40Z

11

Solution 1)

delete from emp
where rowid not in
(select max(rowid) from emp group by empno);

Solution 2)

delete from emp where rowid in
               (
                 select rid from
                  (
                    select rowid rid,
                      row_number() over(partition by empno order by empno) rn
                      from emp
                  )
                where rn > 1
               );

Solution 3)

delete from emp e1
         where rowid not in
          (select max(rowid) from emp e2
           where e1.empno = e2.empno );

edited Dec 31, 2015 at 11:29

huzeyfe

3,6928 gold badges40 silver badges50 bronze badges

answered Dec 31, 2015 at 10:32

DoOrDie

3153 silver badges12 bronze badges

1

Could you tell us the pros and cons of each one of the approach?
– Arun Gowda
Commented Aug 26, 2020 at 17:56

Add a comment |

Mohammed khaled · Accepted Answer · 2013-01-11 17:01:33Z

7

create table t2 as select distinct * from t1;

answered Jan 11, 2013 at 17:01

Mohammed khaled

791 silver badge1 bronze badge

not an answer - distinct * will take every record which differs in at least 1 symbol in 1 column. All you need is to select distinct values only from columns you want to make primary keys - Bill's answer is great example of this approach.
– Nogard
Commented Jan 11, 2013 at 17:28
1

That was what I needed (remove entirely identical lines). Thanks !
– Emmanuel
Commented Feb 20, 2013 at 11:43
Another disadvantage of this method is that you have to create a copy of your table. For huge tables, this implies providing additional tablespace, and deleting or shrinking the tablespace after the copy. Bill's method has more benefits, and no additional disadvantages.
– Wouter
Commented May 15, 2014 at 13:59

Add a comment |

Nick · Accepted Answer · 2009-02-09 17:44:14Z

4

You should do a small pl/sql block using a cursor for loop and delete the rows you don't want to keep. For instance:

declare
prev_var my_table.var1%TYPE;

begin

for t in (select var1 from my_table order by var 1) LOOP

-- if previous var equal current var, delete the row, else keep on going.
end loop;

end;

answered Feb 9, 2009 at 17:44

Nick

2,53418 silver badges26 bronze badges

I believe the downvote is because you are using PL/SQL when you can do it in SQL, incase you are wondering.
– WW.
Commented Feb 10, 2009 at 1:39
9

Just because you can do it in SQL, doesn't mean its the only solution. I posted this solution, after I had seen the SQL-only solution. I thought down votes were for incorrect answers.
– Nick
Commented Feb 10, 2009 at 2:43

Add a comment |

Ala Abid · Accepted Answer · 2021-06-07 13:28:29Z

This blog post was really helpful for general cases:

If the rows are fully duplicated (all values in all columns can have copies) there are no columns to use! But to keep one you still need a unique identifier for each row in each group. Fortunately, Oracle already has something you can use. The rowid. All rows in Oracle have a rowid. This is a physical locator. That is, it states where on disk Oracle stores the row. This unique to each row. So you can use this value to identify and remove copies. To do this, replace min() with min(rowid) in the uncorrelated delete:

delete films
where  rowid not in (
  select min(rowid)
  from   films
  group  by title, uk_release_date
)

rationalboss · Accepted Answer · 2016-07-22 06:21:03Z

To select the duplicates only the query format can be:

SELECT GroupFunction(column1), GroupFunction(column2),..., 
COUNT(column1), column1, column2...
FROM our_table
GROUP BY column1, column2, column3...
HAVING COUNT(column1) > 1

So the correct query as per other suggestion is:

DELETE FROM tablename a
      WHERE a.ROWID > ANY (SELECT b.ROWID
                             FROM tablename b
                            WHERE a.fieldname = b.fieldname
                              AND a.fieldname2 = b.fieldname2
                              AND ....so on.. to identify the duplicate rows....)

This query will keep the oldest record in the database for the criteria chosen in the WHERE CLAUSE.

Oracle Certified Associate (2008)

Krunal Patel · Accepted Answer · 2017-07-03 10:55:31Z

create table abcd(id number(10),name varchar2(20))

insert into abcd values(1,'abc')

insert into abcd values(2,'pqr')


insert into abcd values(3,'xyz')

insert into abcd values(1,'abc')

insert into abcd values(2,'pqr')

insert into abcd values(3,'xyz')


select * from abcd
id  Name
1   abc
2   pqr
3   xyz
1   abc
2   pqr
3   xyz

Delete Duplicate record but keep Distinct Record in table 

DELETE 
FROM abcd a
WHERE ROWID > (SELECT MIN(ROWID) FROM abcd b
WHERE b.id=a.id
);

run the above query 3 rows delete 

select * from abcd

id  Name 
1   abc
2   pqr
3   xyz

EstevaoLuis · Accepted Answer · 2019-06-28 14:19:15Z

3

solution :

delete from emp where rowid in
(
    select rid from
    (
        select rowid rid,
        row_number() over(partition by empno order by empno) rn
        from emp
    )
    where rn > 1
);

edited Jun 28, 2019 at 14:19

EstevaoLuis

2,5427 gold badges35 silver badges41 bronze badges

answered Jun 28, 2019 at 11:58

sandeep gupta

593 bronze badges

Add a comment |

Stephen Ostermiller · Accepted Answer · 2014-05-30 01:36:57Z

The Fastest way for really big tables

Create exception table with structure below: exceptions_table

ROW_ID ROWID
OWNER VARCHAR2(30)
TABLE_NAME VARCHAR2(30)
CONSTRAINT VARCHAR2(30)

Try create a unique constraint or primary key which will be violated by the duplicates. You will get an error message because you have duplicates. The exceptions table will contain the rowids for the duplicate rows.
```
alter table add constraint
unique --or primary key
(dupfield1,dupfield2) exceptions into exceptions_table;
```

Join your table with exceptions_table by rowid and delete dups

delete original_dups where rowid in (select ROW_ID from exceptions_table);

If the amount of rows to delete is big, then create a new table (with all grants and indexes) anti-joining with exceptions_table by rowid and rename the original table into original_dups table and rename new_table_with_no_dups into original table
```
create table new_table_with_no_dups AS (
    select field1, field2 ........ 
    from original_dups t1
    where not exists ( select null from exceptions_table T2 where t1.rowid = t2.row_id )
)
```

NSNoob · Accepted Answer · 2015-12-28 15:25:23Z

2

Using rowid-

delete from emp
 where rowid not in
 (select max(rowid) from emp group by empno);

Using self join-

delete from emp e1
 where rowid not in
 (select max(rowid) from emp e2
 where e1.empno = e2.empno );

edited Dec 28, 2015 at 15:25

NSNoob

5,5886 gold badges42 silver badges56 bronze badges

answered Dec 28, 2015 at 14:12

Dnyaneshwar Tandale

211 bronze badge

Hi Tandale, Please use code formatting tool while submitting answers as it increases readability.
– NSNoob
Commented Dec 28, 2015 at 14:17

Add a comment |

Community · Accepted Answer · 2020-06-20 09:12:55Z

2

Solution 4)

 delete from emp where rowid in
            (
             select rid from
                (
                  select rowid rid,
                  dense_rank() over(partition by empno order by rowid
                ) rn
             from emp
            )
 where rn > 1
);

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Dec 31, 2015 at 12:28

DoOrDie

3153 silver badges12 bronze badges

Can you explain a bit?
– Dieter Meemken
Commented Dec 31, 2015 at 13:04
dense rank with partition by gives the rank for duplicate rows with same number for example three rows having rank 1 , 1 , 1 and rowid create for every row as unic and we are trying to delete those rowids which are not matching.
– DoOrDie
Commented Dec 31, 2015 at 13:18
we can use both rank and dense_rank functions but i think rank works perfectly in this scenario.
– DoOrDie
Commented Dec 31, 2015 at 13:40

Add a comment |

score 2 · Accepted Answer · 2016-02-10 09:34:42Z

1. solution

delete from emp
    where rowid not in
    (select max(rowid) from emp group by empno);

2. sloution

delete from emp where rowid in
               (
                 select rid from
                  (
                    select rowid rid,
                      row_number() over(partition by empno order by empno) rn
                      from emp
                  )
                where rn > 1
               );

3.solution

delete from emp e1
         where rowid not in
          (select max(rowid) from emp e2
           where e1.empno = e2.empno );

4. solution

 delete from emp where rowid in
            (
             select rid from
                (
                  select rowid rid,
                  dense_rank() over(partition by empno order by rowid
                ) rn
             from emp
            )
 where rn > 1
);

DoOrDie · Accepted Answer · 2016-02-10 10:03:19Z

2

5. solution

delete from emp where rowid in 
    (
      select  rid from
       (
         select rowid rid,rank() over (partition by emp_id order by rowid)rn from emp     
       )
     where rn > 1
    );

answered Feb 10, 2016 at 10:03

DoOrDie

3153 silver badges12 bronze badges

Add a comment |

Md Wasi · Accepted Answer · 2017-01-07 06:34:04Z

2

DELETE from table_name where rowid not in (select min(rowid) FROM table_name group by column_name);

and you can also delete duplicate records in another way

DELETE from table_name a where rowid > (select min(rowid) FROM table_name b where a.column=b.column);

answered Jan 7, 2017 at 6:34

Md Wasi

4834 silver badges17 bronze badges

Add a comment |

JgSudhakar · Accepted Answer · 2014-01-12 05:32:10Z

1

DELETE FROM tableName  WHERE ROWID NOT IN (SELECT   MIN (ROWID) FROM table GROUP BY columnname);

answered Jan 12, 2014 at 5:32

JgSudhakar

553 silver badges9 bronze badges

1

Same answer as the more elaborate answer of Bill the Lizard.
– Wouter
Commented May 15, 2014 at 13:55

Add a comment |

Nic Wortel · Accepted Answer · 2014-05-20 09:14:43Z

1

delete from dept
where rowid in (
     select rowid
     from dept
     minus
     select max(rowid)
     from dept
     group by DEPTNO, DNAME, LOC
);

edited May 20, 2014 at 9:14

Nic Wortel

11.3k6 gold badges61 silver badges79 bronze badges

answered May 20, 2014 at 8:49

user3655760

111 bronze badge

Can you add more information about your way? Thanks.
– Reporter
Commented May 20, 2014 at 9:16

Add a comment |

AlexB · Accepted Answer · 2015-03-11 10:48:39Z

1

For best performance, here is what I wrote :
(see execution plan)

DELETE FROM your_table
WHERE rowid IN 
  (select t1.rowid from your_table  t1
      LEFT OUTER JOIN (
      SELECT MIN(rowid) as rowid, column1,column2, column3
      FROM your_table 
      GROUP BY column1, column2, column3
  )  co1 ON (t1.rowid = co1.rowid)
  WHERE co1.rowid IS NULL
);

edited Mar 11, 2015 at 10:48

AlexB

7,38612 gold badges59 silver badges76 bronze badges

answered Mar 11, 2015 at 10:04

Enguerrand JORE

211 bronze badge

Add a comment |

Nikhil Manapure · Accepted Answer · 2017-09-14 10:05:27Z

Check below scripts -

1.

Create table test(id int,sal int);

2.

    insert into test values(1,100);    
    insert into test values(1,100);    
    insert into test values(2,200);    
    insert into test values(2,200);    
    insert into test values(3,300);    
    insert into test values(3,300);    
    commit;

3.

 select * from test;

You will see here 6-records.
4.run below query -

delete from 
   test
where rowid in
 (select rowid from 
   (select 
     rowid,
     row_number()
    over 
     (partition by id order by sal) dup
    from test)
  where dup > 1)

select * from test;

You will see that duplicate records have been deleted.
Hope this solves your query. Thanks :)

Darrel Lee · Accepted Answer · 2018-07-11 17:16:00Z

I didn't see any answers that use common table expressions and window functions. This is what I find easiest to work with.

DELETE FROM
 YourTable
WHERE
 ROWID IN
    (WITH Duplicates
          AS (SELECT
               ROWID RID, 
               ROW_NUMBER() 
               OVER(
               PARTITION BY First_Name, Last_Name, Birth_Date)
                  AS RN
               SUM(1)
               OVER(
               PARTITION BY First_Name, Last_Name, Birth_Date
               ORDER BY ROWID ROWS BETWEEN UNBOUNDED PRECEDING 
                                       AND UNBOUNDED FOLLOWING)
                   AS CNT
              FROM
               YourTable
              WHERE
               Load_Date IS NULL)
     SELECT
      RID
     FROM
      duplicates
     WHERE
      RN > 1);

Somethings to note:

1) We are only checking for duplication on the fields in the partition clause.

2) If you have some reason to pick one duplicate over others you can use an order by clause to make that row will have row_number() = 1

3) You can change the number duplicate preserved by changing the final where clause to "Where RN > N" with N >= 1 (I was thinking N = 0 would delete all rows that have duplicates, but it would just delete all rows).

4) Added the Sum partition field the CTE query which will tag each row with the number rows in the group. So to select rows with duplicates, including the first item use "WHERE cnt > 1".

Howd · Accepted Answer · 2021-11-24 03:54:29Z

1

This is similar to the top answer but gives me a much better explain plan:

delete from your_table
 where rowid in (
        select max(rowid)
          from your_table
         group by column1, column2, column3
        having count(*) > 1
       );

answered Nov 24, 2021 at 3:54

Howd

714 bronze badges

Add a comment |

Radim Köhler · Accepted Answer · 2013-11-26 09:27:25Z

0

create or replace procedure delete_duplicate_enq as
    cursor c1 is
    select *
    from enquiry;
begin
    for z in c1 loop
        delete enquiry
        where enquiry.enquiryno = z.enquiryno
        and rowid > any
        (select rowid
        from enquiry
        where enquiry.enquiryno = z.enquiryno);
    end loop;
 end delete_duplicate_enq;

edited Nov 26, 2013 at 9:27

Radim Köhler

123k48 gold badges240 silver badges340 bronze badges

answered Nov 26, 2013 at 9:04

Ashish sinha

1481 gold badge2 silver badges9 bronze badges

A major disadvantage of this method is the inner join. For big tables this will be a lot slower than Bill's method. Also, using PL/SQL to do this is overkill, you could also use this by simply using sql.
– Wouter
Commented May 15, 2014 at 13:57

Add a comment |

Collectives™ on Stack Overflow

Removing duplicate rows from table in Oracle

24 Answers 24

Solution 1)

Solution 2)

Solution 3)

Solution 4)

Not the answer you're looking for? Browse other questions tagged
sql
oracle
duplicates
delete-row
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

24 Answers 24

Solution 1)

Solution 2)

Solution 3)

Solution 4)

Not the answer you're looking for? Browse other questions tagged sqloracleduplicatesdelete-row or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
sql
oracle
duplicates
delete-row
or ask your own question.