Our system has quite a high level of concurrency with multiple java threads picking up one record at a time from a given Oracle 11g table which normally holds about two millions records. There are always many records ready to be picked up for processing. The records ready to be processed are selected based on a relatively complex SQL statement but once selected the processing order is based on a FIFO algorithm (ID order). It is crucial that the same record is not picked up by two distinct threads. Because of this we have a locking mechanism in place.
From a high level view the way in which it works at the moment is that java thread invokes a stored procedure which in turn will open a RECORD_READY_KEYS cursor and then it iterates trough that cursor and try to acquire a lock on a record on a locking table with that key. The locking attempt is done with SELECT FOR UPDATE SKIP LOCKED. If the lock succeeds the record to process is returned to the java thread for processing.
Everything works fine as long as the records ready to process are not too many. However when this number grows over a limit (from our observations when going over 15K) the SQL statement used to get the RECORD_READY_KEYS cursor starts decreasing in performance. Despite the fact it is fully optimised it starts taking close to 0.2 seconds to run which means you can only process maximum five records per second per java thread. In reality considering the time taken to acquire the lock, to travel over the network, to actually do the processing, commit the transaction, etc. will result in even slower processing.
Increasing the number of java threads is an option, however we cannot go over a certain limit as they will start putting pressure on the database/application server/system resources, etc.
The real problem is that we run an SQL statement to get the RECORD_READY_KEYS containing fifteen thousand keys out of a total of two millions and we then pick up the first available record from the top and the we discard the rest by closing the cursor.
My idea would be to have a KEYS_CACHE nested table defined at package level and store the result of RECORD_READY_KEYS selection in that nested table. Once a key is locked it will delete it from the KEYS_CACHE and will return it to the java thread. The process can go that way until the whole KEYS_CACHE gets consumed and when this happens it will populate it again.
Now my questions will be:
Q1. Can you see any weak point with this approach. I can see multiple threads trying to lock the same record at the same time and such wasting a bit of time. On the java side we can make the stored procedure invocation synchronized to a given extend only as the invocation will happen from multiple JVMs. However I cannot see this a major issue. Another issue would be when an unlikely rollback happens as there will be no easy way to put back the deleted key. The next RECORD_READY_KEYS selection will pick it back again and a delay of a few minutes will not really matter.
Q2. As the nested table gets less and less records it will become very sparse. Can you see this becoming a problem? If so should I limit the initial size to say 5000 keys or it does not really matter.
Q3. Can you see a problem with that package level KEYS_CACHE nested table being accessed concurrently by so many threads (we have between 25 to 100 of them)
Q4. Can you see an alternative approach that would not require a whole system redesign.
Thank you in advance
I think I was not very when explaining my situation. We do not lock the records to process in the two millions records table but we do the lock the key instead that are also saved on a different locking table.
Say I have this 2 million records table called messages:
And there are only messages with Key-A, Key-B, and Key-C that are ready to be processed a possible content of the key locking table may be:
Note the Key-X is in there even if no messages ready to be processed for that key because messages with such a key were just finished processing and the clean-up thread did not kicked off yet. That is OK and even desirable in case more new messages with Key-X will enter the system in a short while it will save a new insert.
So our select (fully optimised) will obtain a list with the Key-A, Key-C, and Key-B in this order (Key-C comes before Key-B because has a message with an Id = 2 which is smaller than the first Key-B message with the Id=6 Very simplified what we do here in fact is
SELECT key FROM messages WHERE ready = ‘Y’ GROUP BY key ORDER BY min(id)
Once we get that select in a cursor we fetch the key one by one and try to lock it in the key_locckings table. Once a lock succeeds the key get assigned to a thread (there is threads table for this) and will stay with that thread processing all messages that are ready for that key. As I mentioned in my first post it is crucial that messages with the same key be processed by the same thread as the key is how we link related messages which must be processed in sequence.
The SELECT above is instantly when the number of keys selected is up to a few thousands. It is still performing OK when it gets around 10000 keys. Once the number of retrieved keys gets over 15000 then the performance starts degrading. The time to run the SELECT is still OK (about 0.2 seconds) and we do have indexes on all fields involved in this selection. It is just that getting the WHERE, GROUP, ORDER BY applied to select 15000 keys out of two million records that take the time.
So the problem for us is that every single thread will run the same SELECT and will get 15000 records just to pick up one of them. The think I was considering was that rather than closing the cursor and throwing the hard work away as we do at the moment to try storing those keys in a package level nested table and delete the keys from there as we allocate them to the threads. My first three questions just wanted to capture some others opinions about this approach while the last one was about finding some alternative ideas (e.g. someone would say use advanced queues, etc)