batch size
A client's perceived database performance is characterized by
- latency (seconds to COMMIT), and
- throughput (rows per second)
Often we can trade one against the other.
You were a bit vague on details of your "RDBMS" and
on what the limiting bottleneck resource was,
but we can safely assume that clients are seeing
both measures impacted during busy times.
Our goal is to do better on at least one measure.
We will focus on a single-threaded message processor,
with no parallelism.
Assume it has a local .CSV file with ten thousand rows
that it wants to shovel into a table.
In most relational databases, throughput will suffer
if we go for minimum app-level latency.
So for a processor whose while
loop repeatedly does
- read one CSV line
- INSERT one row
- COMMIT
we will typically see low latency and poor throughput,
perhaps on the order of a hundred rows per second, less if there's WAN
round trips
slowing things down.
Even with a local RDBMS, we're waiting on latency
of the I/O subsystem to report it has persisted a row,
plus some per-row processing overhead.
That was a batch size of 1
.
If we choose a larger batch size K,
we can send K rows per COMMIT,
amortizing some of the overhead,
taking advantage of TCP WAN bandwidth to a distant DB server
so round trip latencies are less interesting,
and most importantly letting the I/O subsystem
do more useful work per request.
Benchmark your local database and see!
Choosing K near 10
or 1000
often will
put you near the sweet spot, maximizing throughput
with inserting at least a thousand rows per second and typically
more than an order of magnitude better than that.
tl;dr: INSERT multiple rows per COMMIT for greater efficiency,
leading to higher throughput and shorter task completion time.
The tradeoff, of course, is that inserting K rows takes
longer than for one row, so latency to persist that first row
will be inflated.
fairness
On a busy database system you have various competing users and needs.
To avoid having INSERT traffic unduly impact interactive queries,
consider telling the message processor to unconditionally
delay D seconds -- sleep(d);
-- after COMMIT'ing
each batch of records.
This guarantees that some amount of DB resource is available for other uses.
During times of high message ingest rate
it will naturally lead to larger per-COMMIT batches,
arguably a good thing.
benchmark
I urge you to take out a stopwatch, generate synthetic load
with different values of K, and verify my timing claims
against the RDBMS you're using.
Do it on a lightly loaded system, perhaps around midnight,
for clean timing figures.
Then verify at noon that peak-time performance follows a similar trend.
Consider publishing such timing observations as a separate Answer post.
temp table
The OP was rather vague on what the bottleneck resource is.
Sometimes we have lots of hardware resource,
in the form of {CPU, I/O bandwidth, network bandwidth, memory},
and are bottlenecked on lock contention.
That is, clients all need to acquire a mutex to
access some common table, perhaps because they are INSERTing
subject to some UNIQUE index constraint.
If that matches your situation, the message processor may want to
CREATE a new table for its exclusive use, and INSERT lots of
rows into it at some high rate, committing them at the end.
With that accomplished, we're still faced with
a "database to database" transfer of rows SELECTed from
the temp table and INSERTed at their final destination.
When lock contention is limiting forward progress,
this two-step decoupling can sometimes improve overall throughput,
in part because the query planner can "see into the future" by
inspecting statistics on the temp table.
Definitely get out your stopwatch and verify,
if you go this route.
For technical reasons related to what can happen during ROLLBACK,
most relational databases can TRUNCATE (or DROP) such a temp table
much more quickly than when using a blind DELETE to remove all rows.
csv import
Some database vendors support directly reading a local .CSV file.
If that option is available to you,
get out your stopwatch and see if that is preferrable
to ten thousand INSERT statements.