2

I want to delete a large amount of records (~200K) from a large table (~500K records) in my MySql DB.

I want to make this call as efficient as possible because i dont want the DB to become "unresponsive" while executing the call. I need to delete records that are "older" than 10 days (according to created_at column), currently I use:

delete from table_name where created_at < DATE_SUB(CURDATE(),INTERVAL 10 DAY)

The table also have a primary key id if it helps.

Any thoughts?

8
  • You may need to use a cursor / delete subsets of the data, so that you don't end up locking your database table (not sure how mysql handles this)
    – forsvarir
    Commented Mar 25, 2011 at 8:53
  • @forsvarir, cursors solve some problems but performance is never one of them. Commented Mar 25, 2011 at 9:12
  • How often are you planning on running these large deletes? Commented Mar 25, 2011 at 9:13
  • @Lieven: but unless I'm mistaken, they do help to address the issue of 'I don't want the DB to become unresponsive while executing the call'
    – forsvarir
    Commented Mar 25, 2011 at 9:14
  • @forsvarir: It could be made to work but you would be trading one large blocking call to many small blocking calls. A 500K table is not all that big these days. There should be no need to resort to such workarounds. Commented Mar 25, 2011 at 10:39

3 Answers 3

15

I was faced with the same challenge. Having no primary key, but an index on the created_at field, it was not possible to delete the oldest 10Mio entries from a 100Mio row table (InnoDB). It constantly failed with lock overflows, due to the fact that the table is online and still receives INSERTS.

To solve the problem I created a stored procedure that tackled the job incrementally:

mysql> CREATE PROCEDURE delete_accesslog_incrementally()
-> MODIFIES SQL DATA
-> BEGIN
-> REPEAT
-> DELETE FROM tbl_name
-> WHERE created_at < DATE_SUB(CURDATE(),INTERVAL 10 day)
-> LIMIT 20000;
-> UNTIL ROW_COUNT() = 0 END REPEAT;
-> END $$

This still runs quite a while, but doesn't harm the concurrent INSERTS that are still hitting the table.

2
5

You could try this:

1) First find the first id value for the desired date:

select id from table_name where created_at = DATE_SUB(CURDATE(),INTERVAL 10 DAY) LIMIT 1;

2) Next delete in batches:

DELETE FROM table_name where id<"id_found_on_step_1" LIMIT 1000; 

On id_found_on_step_1 put the id value you found on step 1. You can try 10000 or lower than 1000. It depends how much time every delete command will take.

This DELETE should be faster, since its WHERE statement uses the primary key.

4

You have a WHERE condition, add an index on created_at field.

2
  • 1
    You could also use DELETE LOW_PRIORITY, which would help a little by waiting for the table to be unused (although by the look of things it doesn't interrupt the operation midway if someone starts reading/writing to the table).
    – Hammerite
    Commented Mar 25, 2011 at 10:18
  • LOW_PRIORITY has affects only storage engines that use only table-level locking (such as MyISAM, MEMORY, and MERGE).
    – revoke
    Commented Jun 6, 2018 at 8:47

Not the answer you're looking for? Browse other questions tagged or ask your own question.