18

Okay, so I know that when dealing with very large data, we can use find_in_batches which, as I understand it, does the work of Model.all.each in a really fast way which is much more efficient

Now, I have a really large data I'm trying to delete, and I'm thinking of using this same find_in_batches to delete them in batches.

below is what I have (from within the rake task database.rake):

old_messages = TextMessage.where("created_at < ?", number.days.ago )

old_messages.find_in_batches do |batch|
  batch.delete_all
end

When I run this however, I get the following error:

ArgumentError: wrong number of arguments (0 for 1..3)
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/httparty-0.13.5/lib/httparty.rb:66:in `logger'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation/delegation.rb:94:in `public_send'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation/delegation.rb:94:in `block in method_missing'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation.rb:302:in `scoping'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation/delegation.rb:94:in `method_missing'
/Users/Sunday/.rvm/gems/ruby-2.2.0/gems/activerecord-4.2.2/lib/active_record/relation/batches.rb:112:in `find_in_batches'
/Users/Sunday/workspace/resilience/lib/tasks/database.rake:18:in `block (2 levels) in <top (required)>'

where database.rake on line 18 is: old_messages.find_in_batches do |batch|.

So, is it possible to use find_in_batches for deleting like I'm trying to do? and what am I doing wrong, if the answer to that is yes. Thanks for all response.

0

4 Answers 4

30

I don't think anyone has answered your question.

To answer 'what you are doing wrong' and can you use 'find_in_batches' in that way:

The reason why 'delete_all' does not work, is because 'delete_all' only works on activerecord relations. When you use 'find_in_batches' the variable 'batch' is now just a normal array, which may have it's own 'delete_all' method that is different,

You may need 'find_in_batches' incase if you have thousands of records to be deleted. So the previous answer is incorrect. (It may lead to memory exceeded exceptions and timeouts)

Note that is not related to the original error you displayed, but you cannot use 'batch' with 'delete_all' because 'batch' is an array and 'delete_all' is for activerecords

How to delete using find_in_batches

I was having a similar problem

user.posts.destroy_all

was overloading the server because of thousands of posts (this is an example my actual model was not 'posts')

You can use

user.posts.select(:id).find_in_batches(batch_size: 100) do |ids|
  Post.where(id: ids).delete_all
end

If it was one sql call, it will try to store all the delete items in memory at once that can break the server, This will have a manageable size of sql calls.

4
  • Okay, you answered the question... to a point. Thank you... a follow up will be: when you use find_in_batch, how'll you go through deleting records in that batch?
    – x6iae
    Commented Nov 30, 2016 at 1:25
  • Well this is how I found your post. I'll update my answer Commented Nov 30, 2016 at 15:32
  • 12
    pluck returns an array - you'll want to use select instead: User.posts.select(:id).find_in_batches Commented Mar 7, 2017 at 17:17
  • when you have "a lot" I'm talking about millions, even 'select' is returning way too much info. Commented Mar 7, 2017 at 19:26
26

New functionality: #in_batches (https://api.rubyonrails.org/classes/ActiveRecord/Batches.html#method-i-in_batches)

Person.where("age > 21").in_batches do |relation|
  relation.delete_all
end
3
  • why do you use this sleep in the process? Does it help somehow the database? Commented Aug 2, 2019 at 12:11
  • I guess that was my thinking - but for the life of me I can't remember. I'm going to delete that line because... why...
    – kwerle
    Commented Aug 3, 2019 at 18:03
  • 4
    The rails api docs on in_batches includes the sleep: sleep(10) # Throttle the delete queries
    – pduey
    Commented Nov 19, 2020 at 13:33
3

Use Model.in_batches.destroy_all.

Be mindful of dependent associations which might still be loaded without batching. I created a gem to fix this: batch_dependent_associations

See also dhh's issue in Rails: Relation#destroy_all should perform its work in batches

2

Using in_batches or find_in_batches with delete_all, like other answers suggest, will result in two queries per batch; one SELECT query to fetch the records and one DELETE query to delete the records.

A way to do batched deletes without fetching records could be:

def delete_batched(records, batch_size: 1000)
  loop do
    amount_deleted = records.reorder(:id).limit(batch_size).delete_all
    break if amount_deleted < batch_size
  end
end

delete_batched(TextMessage.where("created_at < ?", number.days.ago))

(It is my understanding that a deterministic processing order is desirable when doing deletes like this, hence the ordering by :id. Using reorder will replace any already set order)

Not the answer you're looking for? Browse other questions tagged or ask your own question.