I recently found and fixed a bug in a site I was working on that resulted in millions of duplicate rows of data in a table that will be quite large even without them (still in the millions). I can easily find these duplicate rows and can run a single delete query to kill them all. The problem is that trying to delete this many rows in one shot locks up the table for a long time, which I would like to avoid if possible. The only ways I can see to get rid of these rows, without taking down the site (by locking up the table) are:

  1. Write a script that will execute thousands of smaller delete queries in a loop. This will theoretically get around the locked table issue because other queries will be able to make it into the queue and run in between the deletes. But it will still spike the load on the database quite a bit and will take a long time to run.
  2. Rename the table and recreate the existing table (it'll now be empty). Then do my cleanup on the renamed table. Rename the new table, name the old one back and merge the new rows into the renamed table. This is way takes considerably more steps, but should get the job done with minimal interruption. The only tricky part here is that the table in question is a reporting table, so once it's renamed out of the way and the empty one put in its place all historic reports go away until I put it back in place. Plus the merging process could be a bit of a pain because of the type of data being stored. Overall this is my likely choice right now.

I was just wondering if anyone else has had this problem before and, if so, how you dealt with it without taking down the site and, hopefully, with minimal if any interruption to the users? If I go with number 2, or a different, similar, approach, I can schedule the stuff to run late at night and do the merge early the next morning and just let the users know ahead of time, so that's not a huge deal. I'm just looking to see if anyone has any ideas for a better, or easier, way to do the cleanup.


WHERE (whatever criteria)
LIMIT 1000

Wash, rinse, repeat until zero rows affected. Maybe in a script that sleeps for a second or three between iterations.

    If you use DELETE with LIMIT, you should really use ORDER BY to make the query deterministic; not doing so would have strange effects (including breaking replication in some cases)
    Note that one can't combine DELETE ... JOIN with ORDER BY or LIMIT.
    I still have my doubts if a pivot table isn't the best way, but, I made a procedure, just to keep the sanity anyway: hastebin.com/nabejehure.pas Commented Jun 5, 2017 at 17:09
    Here is a simple Python script which implements this approach: gist.github.com/tsauerwein/ffb159d1ab95d7fd91ef43b9609c471d
  • Why we have to sleep between iterations?
I had a use case of deleting 1M+ rows in the 25M+ rows Table in the MySQL. Tried different approaches like batch deletes (described above).
I've found out that the fastest way (copy of required records to new table):

  1. Create Temporary Table that holds just ids.

CREATE TABLE id_temp_table ( temp_id int);

  1. Insert ids that should be removed:

insert into id_temp_table (temp_id) select.....

  1. Create New table table_new

  2. Insert all records from table to table_new without unnecessary rows that are in id_temp_table

insert into table_new .... where table_id NOT IN (select distinct(temp_id) from id_temp_table);

  1. Rename tables

The whole process took ~1hr. In my use case simple delete of batch on 100 records took 10 mins.

  • for step 4 you can left join to use the index: insert into table_new ... select ... from table left join id_temp_table t on t.temp_id = table.id where t.temp_id is NULL;
This is a very simply way to speed up MySQL deletes that uses "mechanical sympathy" i.e. the approach is in tune with how MySQL works 'under the hood' - so let's 'pop the hood' briefly:

I think the slowness is due to MySQl's "clustered index" where the actual records are stored within the primary key index - in the order of the primary key index. This means access to a record via the primary key is extremely fast because it only requires one disk fetch because the record on the disk is right there where it found the correct primary key in the index.

In other databases without clustered indexes the index itself does not hold the record but just an "offset" or "location" indicating where the record is located in the table file and then a second fetch must be made in that file to retrieve the actual data.

You can imagine that when deleting a record in a clustered index (like MySQL uses) all records above that record in the index (=table) must be moved downwards to avoid massive holes being created in the index (well that is what I recall from a few years ago at least - version 8.x may have improved this issue).

Armed with knowledge of the above 'under the hood' operations, what we discovered that really sped up deletes in MySQL 5.x was to perform the deletes in reverse order. This produces the least amount of record movement because you are deleting records from the end first, meaning that subsequent deletes have less records to relocate - logical right?!

    I really like this thinking! I love that it makes sense visually, like a toy a child could understand.
    This really made the difference for me. Deleting 10K rows in a table that had 5M rows took 5 minutes initially. Then I added ORDER BY id DESC LIMIT 10000 to the delete statement and it took only 1 second. Later I increased the size to 1M at a time. The whole process took 10 minutes. Commented Jul 12, 2021 at 19:34
    @GaniSimsek I'm always happy to hear of cases where others have benefited from some of my "that's just so crazy it might just work" ideas :)
I'd also recommend adding some constraints to your table to make sure that this doesn't happen to you again. A million rows, at 1000 per shot, will take 1000 repetitions of a script to complete. If the script runs once every 3.6 seconds you'll be done in an hour. No worries. Your clients are unlikely to notice.


the following deletes 1,000,000 records, one at a time.

 for i in `seq 1 1000`; do 
     mysql  -e "select id from table_name where (condition) order by id desc limit 1000 " | sed 's;/|;;g' | awk '{if(NR>1)print "delete from table_name where id = ",$1,";" }' | mysql; 

you could group them together and do delete table_name where IN (id1,id2,..idN) im sure too w/o much difficulty

    This is the only solution that worked for me with a 100GB table. Select with limit 1000 was just a few milliseconds but the delete with the same query took an hour for just 1000 records, although a SSD is in place. Deleting this way is still slow but at least a thousand rows per second and not hour. Commented Feb 17, 2017 at 19:15
  • deleting 1 M record in one go will kill your server Commented Jan 27, 2018 at 11:36
  • I was able to delete 100,000 records at a time (DELETE FROM table WHERE id <= 100000, then 200000, etc). Each batch took between 30 seconds and 1 minute. But when I previously tried to delete 1,300,000 at once, the query ran for at least 30 minutes before failing with ERROR 2013 (HY000): Lost connection to MySQL server during query. I ran these queries in the MySQL client on the same virtual machine as the server, but maybe the connection timed out. Commented Dec 8, 2018 at 4:24

Here's the recommended practice:

rows_affected = 0
do {
 rows_affected = do_query(
   LIMIT 10000"
} while rows_affected > 0

Deleting 10,000 rows at a time is typically a large enough task to make each query efficient, and a short enough task to minimize the impact on the server (transactional storage engines might benefit from smaller transactions). It might also be a good idea to add some sleep time between the DELETE statements to spread the load over time and reduce the amount of time locks are held.

Reference MySQL High Performance


I faced a similar problem. We had a really big table, about 500 GB in size with no partitioning and one only one index on the primary_key column. Our master was a hulk of a machine, 128 cores and 512 Gigs of RAM and we had multiple slaves too. We tried a few techniques to tackle the large-scale deletion of rows. I will list them all here from worst to best that we found-

  1. Fetching and Deleting one row at a time. This is the absolute worst that you could do. So, we did not even try this.
  2. Fetching first 'X' rows from the database using a limit query on the primary_key column, then checking the row ids to delete in the application and firing a single delete query with a list of primary_key ids. So, 2 queries per 'X' rows. Now, this approach was fine but doing this using a batch job deleted about 5 million rows in 10 minutes or so, due to which the slaves of our MySQL DB were lagged by 105 seconds. 105-second lag in 10-minute activity. So, we had to stop.
  3. In this technique, we introduced a 50 ms lag between our subsequent batch fetch and deletions of size 'X' each. This solved the lag problem but we were now deleting 1.2-1.3 million rows per 10 minutes as compared to 5 million in technique #2.
  4. Partitioning the database table and then deleting the entire partitions when not needed. This is the best solution we have but it requires a pre-partitioned table. We followed step 3 because we had a non-partitioned very old table with only indexing on the primary_key column. Creating a partition would have taken too much time and we were in a crisis mode. Here are some links related to partitioning that I found helpful- Official MySQL Reference, Oracle DB daily partitioning.

So, IMO, if you can afford to have the luxury of creating a partition in your table, go for the option #4, otherwise, you are stuck with option #3.


I have had the same case earlier. There were more than 45 million duplicate data stored during database migration. Yeah, it happened. :)

What I did was:

  • Created a temporary table filtering only unique
  • Truncated the original table
  • Inserted back to the original table from the temporary table.
  • After making sure the data is correct, I deleted the temporary table.

Overall, it took around 2.5 minutes I guess.


CREATE TABLE mytable_temp AS SELECT * FROM my_original_table WHERE my_condition;
TRUNCATE TABLE my_original_table;
INSERT INTO my_original_table  SELECT * FROM mytable_temp;

Based on @rich's answer, I wrote this signe line command :

for i in {1..1000}; do mysql -vv --user=THE_USER --password=THE_PWD --host=YOUR_DB_HOST THE_DB_NAME -e "DELETE FROM THE_DB_NAME.THE_TABLE WHERE 'date' < NOW() - INTERVAL 4 MONTH LIMIT 10000;"; sleep 1; done;
  • -vv : displays the DELETE result, so I can check the deleted rows count
  • --host : I'm running the request in another server, so I have to define the mysql host address
  • 'date' : using simple quotes (and not `) allowed me to escape the column name
  • NOW() - INTERVAL 4 MONTH : delete only old entries (more than 4 months)
  • sleep 1 : wait on second to avoid crashing the server

I'd use mk-archiver from the excellent Maatkit utilities package (a bunch of Perl scripts for MySQL management) Maatkit is from Baron Schwartz, the author of the O'Reilly "High Performance MySQL" book.

The goal is a low-impact, forward-only job to nibble old data out of the table without impacting OLTP queries much. You can insert the data into another table, which need not be on the same server. You can also write it to a file in a format suitable for LOAD DATA INFILE. Or you can do neither, in which case it's just an incremental DELETE.

It's already built for archiving your unwanted rows in small batches and as a bonus, it can save the deleted rows to a file in case you screw up the query that selects the rows to remove.

No installation required, just grab http://www.maatkit.org/get/mk-archiver and run perldoc on it (or read the web site) for documentation.

    For reference, Maatkit was discontinued in 2011 and merged into Percona Toolkit.
For us, the DELETE WHERE %s ORDER BY %s LIMIT %d answer was not an option, because the WHERE criteria was slow (a non-indexed column), and would hit master.

SELECT from a read-replica a list of primary keys that you wish to delete. Export with this kind of format:


Use the following bash script to grab this input and chunk it into DELETE statements [requires bash ≥ 4 because of mapfile built-in]:

sql-chunker.sh (remember to chmod +x me, and change the shebang to point to your bash 4 executable):


# Expected input format:
: <<!

if [ -z "$1" ]
    echo "No chunk size supplied. Invoke: ./sql-chunker.sh 1000 ids.txt"

if [ -z "$2" ]
    echo "No file supplied. Invoke: ./sql-chunker.sh 1000 ids.txt"

function join_by {
    local d=$1
    echo -n "$1"
    printf "%s" "${@/#/$d}"

while mapfile -t -n "$1" ary && ((${#ary[@]})); do
    printf "DELETE FROM my_cool_table WHERE id IN ('%s');\n" `join_by "','" "${ary[@]}"`
done < "$2"

Invoke like so:

./sql-chunker.sh 1000 ids.txt > batch_1000.sql

This will give you a file with output formatted like so (I've used a batch size of 2):

DELETE FROM my_cool_table WHERE id IN ('006CC671-655A-432E-9164-D3C64191EDCE','006CD163-794A-4C3E-8206-D05D1A5EE01E');
DELETE FROM my_cool_table WHERE id IN ('006CD837-F1AD-4CCA-82A4-74356580CEBC','006CDA35-F132-4F2C-8054-0F1D6709388A');

Then execute the statements like so:

mysql --login-path=master billing < batch_1000.sql

For those unfamiliar with login-path, it's just a shortcut to login without typing password in the command line.

Do it in batches of lets say 2000 rows at a time. Commit in-between. A million rows isn't that much and this will be fast, unless you have many indexes on the table.


I had a really loaded base that needed to delete some older entries all the time. Some of the delete queries started to hang so I needed to kill them, and if there are too many deletes the whole base become unresponsive so I needed to restrict the parallel runs. So I've created a cron job running every minute starting this script:





touch $log_file
log_file_size=`stat -c%s "$log_file"`
if (( $log_file_size > $log_max_size ))
    rm -f "$log_file"

delete_queries=`mysql -u user -p$PASS -e  "SELECT * FROM information_schema.processlist WHERE Command = 'Query' AND INFO LIKE 'DELETE FROM big.table WHERE result_timestamp %';"| grep Query|wc -l`

## -- here the hanging DELETE queries will be stopped
mysql-u $USER -p$PASS -e "SELECT ID FROM information_schema.processlist WHERE Command = 'Query' AND INFO LIKE 'DELETE FROM big.table WHERE result_timestamp %'and TIME>$max_query_time;" |grep -v ID| while read -r id ; do
    echo "delete query stopped on `date`" >>  $log_file
    mysql -u $USER -p$PASS -e "KILL $id;"

if (( $delete_queries > $max_delete_queries ))
  sleep $sleep_interval

  delete_queries=`mysql-u $USER -p$PASS -e  "SELECT * FROM information_schema.processlist WHERE Command = 'Query' AND INFO LIKE 'DELETE FROM big.table WHERE result_timestamp %';"| grep Query|wc -l`

  if (( $delete_queries > $max_delete_queries ))

      sleep $sleep_interval

      delete_queries=`mysql -u $USER -p$PASS -e  "SELECT * FROM information_schema.processlist WHERE Command = 'Query' AND INFO LIKE 'DELETE FROM big.table WHERE result_timestamp %';"| grep Query|wc -l`

      # -- if there are too many delete queries after the second wait
      #  the table will be cleaned up by the next cron job
      if (( $delete_queries > $max_delete_queries ))
            echo "clean-up skipped on `date`" >> $log_file
            exit 1


running_operations=`mysql-u $USER -p$PASS -p -e "SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND != 'Sleep';"| wc -l`

if (( $running_operations < $min_operations ))
    # -- if the database is not too busy this bigger batch can be processed
    batch_size=$(($i_size * 5))

echo "starting clean-up on `date`" >>  $log_file

mysql-u $USER -p$PASS -e 'DELETE FROM big.table WHERE result_timestamp < UNIX_TIMESTAMP(DATE_SUB(NOW(), INTERVAL 31 DAY))*1000 limit '"$batch_size"';'

if [ $? -eq 0 ]; then
    # -- if the sql command exited normally the exit code will be 0
    echo "delete finished successfully on `date`" >>  $log_file
    echo "delete failed on `date`" >>  $log_file

With this I've achieved around 2 million deletes per day which was ok for my usecase.


I have faced similar issue while deleting multiple records from transaction table after moving them to archival table.

I used to use temporary table to identify records to be deleted.

The temporary table that I used 'archive_temp' to store ids created in memory without any indexes.

Hence while deleting records from original transaction table as e.g. DELETE from tat where id in (select id from archive_temp); query used to return an error "LOST Connection to server"

I created index on that temporary table as follows after creating it: ALTER TABLE archive_temp ADD INDEX( id);

After this my delete query used to execute in less than seconds irrespective of number of records to be deleted from transaction table.

Hence it would be better to check indexes. Hope this might help.


If someone having, "System lock" issues, this article will give you better performance:

    DECLARE incrementValue INT DEFAULT 10000;
    DECLARE maxIdx BIGINT DEFAULT 530000000;
    WHILE curMaxId <= maxIdx DO
        DELETE FROM table WHERE id < curMaxId;
        SET curMaxId = curMaxId + incrementValue;

This queries Delete a BIG TABLES in seconds:

CREATE TABLE <my_table_temp> LIKE <my_table>;

RENAME TABLE <my_table> TO <my_table_delete>;

RENAME TABLE <my_table_temp> TO <my_table>;

DROP TABLE <my_table_delete>;


I have not scripted anything to do this, and doing it properly would absolutely require a script, but another option is to create a new, duplicate table and select all the rows you want to keep into it. Use a trigger to keep it up-to-date while this process completes. When it is in sync (minus the rows you want to drop), rename both tables in a transaction, so that the new one takes the place of the old. Drop the old table, and voila!

This (obviously) requires a lot of extra disk space, and may tax your I/O resources, but otherwise, can be much faster.

Depending on the nature of the data or in an emergency, you could rename the old table and create a new, empty table in it's place, and select the "keep" rows into the new table at your leisure...


According to the mysql documentation, TRUNCATE TABLE is a fast alternative to DELETE FROM. Try this:


I tried this on 50M rows and it was done within two mins.

Note: Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock

    This will definitely delete rows. I'm pretty sure the OP wants to be selective though.
