94

In my app I need to do a lot of INSERTS. Its a Java app and I am using plain JDBC to execute the queries. The DB being Oracle. I have enabled batching though, so it saves me network latencies to execute queries. But the queries execute serially as separate INSERTs:

insert into some_table (col1, col2) values (val1, val2)
insert into some_table (col1, col2) values (val3, val4)
insert into some_table (col1, col2) values (val5, val6)

I was wondering if the following form of INSERT might be more efficient:

insert into some_table (col1, col2) values (val1, val2), (val3, val4), (val5, val6)

i.e. collapsing multiple INSERTs into one.

Any other tips for making batch INSERTs faster?

2
  • 4
    WOW! I tested your "collapse multiple inserts into one" while inserting to SQL Server, and I went from 107 rows/second to 3333 rows per second!
    – Wouter
    Commented Nov 7, 2016 at 16:03
  • 2
    That's a stunning 31x increase.
    – Gaurav
    Commented Aug 13, 2018 at 11:06

12 Answers 12

183

This is a mix of the two previous answers:

  PreparedStatement ps = c.prepareStatement("INSERT INTO employees VALUES (?, ?)");

  ps.setString(1, "John");
  ps.setString(2,"Doe");
  ps.addBatch();

  ps.clearParameters();
  ps.setString(1, "Dave");
  ps.setString(2,"Smith");
  ps.addBatch();

  ps.clearParameters();
  int[] results = ps.executeBatch();
8
  • 3
    This is perfect solutions as statement is prepared (parsed) only once. Commented Sep 24, 2010 at 12:06
  • 56
    The ps.clearParameters(); is unnecessary in this particular case.
    – BalusC
    Commented Sep 24, 2010 at 12:21
  • 2
    Be sure to measure it. Depending on the JDBC driver's implementation this might be the expected one roundtrip per batch but can also end up being one roundtrip per statement. Commented Jun 20, 2014 at 0:00
  • 3
    for mysql also add the following to the url: "&useServerPrepStmts=false&rewriteBatchedStatements=true" Commented Aug 31, 2019 at 22:24
  • 1
    why is the clearParameters() useless and when should I use it ?
    – Zied Orabi
    Commented Jan 19, 2022 at 23:14
66

Though the question asks inserting efficiently to Oracle using JDBC, I'm currently playing with DB2 (On IBM mainframe), conceptually inserting would be similar so thought it might be helpful to see my metrics between

  • inserting one record at a time

  • inserting a batch of records (very efficient)

Here go the metrics

1) Inserting one record at a time

public void writeWithCompileQuery(int records) {
    PreparedStatement statement;

    try {
        Connection connection = getDatabaseConnection();
        connection.setAutoCommit(true);

        String compiledQuery = "INSERT INTO TESTDB.EMPLOYEE(EMPNO, EMPNM, DEPT, RANK, USERNAME)" +
                " VALUES" + "(?, ?, ?, ?, ?)";
        statement = connection.prepareStatement(compiledQuery);

        long start = System.currentTimeMillis();

        for(int index = 1; index < records; index++) {
            statement.setInt(1, index);
            statement.setString(2, "emp number-"+index);
            statement.setInt(3, index);
            statement.setInt(4, index);
            statement.setString(5, "username");

            long startInternal = System.currentTimeMillis();
            statement.executeUpdate();
            System.out.println("each transaction time taken = " + (System.currentTimeMillis() - startInternal) + " ms");
        }

        long end = System.currentTimeMillis();
        System.out.println("total time taken = " + (end - start) + " ms");
        System.out.println("avg total time taken = " + (end - start)/ records + " ms");

        statement.close();
        connection.close();

    } catch (SQLException ex) {
        System.err.println("SQLException information");
        while (ex != null) {
            System.err.println("Error msg: " + ex.getMessage());
            ex = ex.getNextException();
        }
    }
}

The metrics for 100 transactions :

each transaction time taken = 123 ms
each transaction time taken = 53 ms
each transaction time taken = 48 ms
each transaction time taken = 48 ms
each transaction time taken = 49 ms
each transaction time taken = 49 ms
...
..
.
each transaction time taken = 49 ms
each transaction time taken = 49 ms
total time taken = 4935 ms
avg total time taken = 49 ms

The first transaction is taking around 120-150ms which is for the query parse and then execution, the subsequent transactions are only taking around 50ms. (Which is still high, but my database is on a different server(I need to troubleshoot the network))

2) With insertion in a batch (efficient one) - achieved by preparedStatement.executeBatch()

public int[] writeInABatchWithCompiledQuery(int records) {
    PreparedStatement preparedStatement;

    try {
        Connection connection = getDatabaseConnection();
        connection.setAutoCommit(true);

        String compiledQuery = "INSERT INTO TESTDB.EMPLOYEE(EMPNO, EMPNM, DEPT, RANK, USERNAME)" +
                " VALUES" + "(?, ?, ?, ?, ?)";
        preparedStatement = connection.prepareStatement(compiledQuery);

        for(int index = 1; index <= records; index++) {
            preparedStatement.setInt(1, index);
            preparedStatement.setString(2, "empo number-"+index);
            preparedStatement.setInt(3, index+100);
            preparedStatement.setInt(4, index+200);
            preparedStatement.setString(5, "usernames");
            preparedStatement.addBatch();
        }

        long start = System.currentTimeMillis();
        int[] inserted = preparedStatement.executeBatch();
        long end = System.currentTimeMillis();

        System.out.println("total time taken to insert the batch = " + (end - start) + " ms");
        System.out.println("total time taken = " + (end - start)/records + " s");

        preparedStatement.close();
        connection.close();

        return inserted;

    } catch (SQLException ex) {
        System.err.println("SQLException information");
        while (ex != null) {
            System.err.println("Error msg: " + ex.getMessage());
            ex = ex.getNextException();
        }
        throw new RuntimeException("Error");
    }
}

The metrics for a batch of 100 transactions is

total time taken to insert the batch = 127 ms

and for 1000 transactions

total time taken to insert the batch = 341 ms

So, making 100 transactions in ~5000ms (with one trxn at a time) is decreased to ~150ms (with a batch of 100 records).

NOTE - Ignore my network which is super slow, but the metrics values would be relative.

4
  • 1
    Hi. Does the length of the record play a role in the time to insert ?? I have 3 Varchar columns with URIs as their values and inserting 8555 as a batch still takind ~3.5 min to insert !! Commented Oct 30, 2017 at 16:47
  • As per my understanding record size might matter during data transfer to database server from your application server but insertion time does not affect much. I tried in local oracle database with 3 columns of size 125 bytes and takes around (145 to 300) ms for batch of 10,000 records. Code here. While multiple transactions for 10,000 records takes 20seconds. Commented Dec 16, 2018 at 20:41
  • Is there a limit of the number of inserts that the batching can handle? I have some analytics tracking code which keeps track of analytic entries in memory and then a single thread runs at a regular internal and inserts the records. There could be thousands of records in the 60 second internal.
    – MattWeiler
    Commented Jul 22, 2021 at 22:16
  • 1
    For anyone who's curious, I tested this batching method against an Oracle DB with 1,000, 10,000, 100,000 and 1,000,000 records and the times were insanely low. The average insertion times were ~0.2 ms per insert regardless of the total number of inserts in a batch. I used System.nanoTime() to get more accurate times.
    – MattWeiler
    Commented Jul 23, 2021 at 1:23
10

The Statement gives you the following option:

Statement stmt = con.createStatement();

stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)");

// submit a batch of update commands for execution
int[] updateCounts = stmt.executeBatch();
7
  • 8
    While the end result is same, in this method, multiple statements are parsed, which is much slower for bulk, in fact not much efficient than executing each statement individually. Also please use PreparedStatement whenever possible for repeated queries as they perform much better.. Commented Sep 24, 2010 at 12:09
  • @AshishPatil: do you have any benchmarks for testing with and without PreparedStatement?
    – Gaurav
    Commented Aug 13, 2018 at 11:10
  • Whoa! After 8 years. Nevertheless, @prayagupd has given detailed stats in his answer which is much more recent. stackoverflow.com/a/42756134/372055 Commented Aug 13, 2018 at 14:34
  • Thank you so much for this. This is really helpful when inserting data dynamically and you don't have the time to be checking the type of data a parameter is.
    – Morfinismo
    Commented Feb 27, 2020 at 16:13
  • 1
    @ChadLehman Not parsing 'every damned column name and value in a loop' is precisely the point of PreparedStatements.
    – user207421
    Commented Jun 10, 2022 at 4:50
5

You'll have to benchmark, obviously, but over JDBC issuing multiple inserts will be much faster if you use a PreparedStatement rather than a Statement.

4

You can use this rewriteBatchedStatements parameter to make the batch insert even faster.

you can read here about the param: MySQL and JDBC with rewriteBatchedStatements=true

1
  • According to documentation, it was removed in MariaDB driver version 3+ :(
    – Gaël J
    Commented Aug 22, 2023 at 15:59
2

SQLite: The above answers are all correct. For SQLite, it is a little bit different. Nothing really helps, even to put it in a batch is (sometimes) not improving performance. In that case, try to disable auto-commit and commit by hand after you are done (Warning! When multiple connections write at the same time, you can clash with these operations)

// connect(), yourList and compiledQuery you have to implement/define beforehand
try (Connection conn = connect()) {
     conn.setAutoCommit(false);
     preparedStatement pstmt = conn.prepareStatement(compiledQuery);
     for(Object o : yourList){
        pstmt.setString(o.toString());
        pstmt.executeUpdate();
        pstmt.getGeneratedKeys(); //if you need the generated keys
     }
     pstmt.close();
     conn.commit();

}
1
  • As you are writing with try-with-resources, you should also try-with-resources the pstmt, so that you won't forget to close the pstmt (like when an exception is thrown, eg concurrent modification). I also like to wrap around the connection auto commit/commit, but this is because I am a little paranoid. SQLite has a long record (as time immemorial) that, unless you provide it with an transaction, than each update in the db will create its own transaction, but always a good remainder Commented Jul 5, 2021 at 4:55
0

How about using the INSERT ALL statement ?

INSERT ALL

INTO table_name VALUES ()

INTO table_name VALUES ()

...

SELECT Statement;

I remember that the last select statement is mandatory in order to make this request succeed. Don't remember why though. You might consider using PreparedStatement instead as well. lots of advantages !

Farid

0

You can use addBatch and executeBatch for batch insert in java See the Example : Batch Insert In Java

0

In my code I have no direct access to the 'preparedStatement' so I cannot use batch, I just pass it the query and a list of parameters. The trick however is to create a variable length insert statement, and a LinkedList of parameters. The effect is the same as the top example, with variable parameter input length.See below (error checking omitted). Assuming 'myTable' has 3 updatable fields: f1, f2 and f3

String []args={"A","B","C", "X","Y","Z" }; // etc, input list of triplets
final String QUERY="INSERT INTO [myTable] (f1,f2,f3) values ";
LinkedList params=new LinkedList();
String comma="";
StringBuilder q=QUERY;
for(int nl=0; nl< args.length; nl+=3 ) { // args is a list of triplets values
    params.add(args[nl]);
    params.add(args[nl+1]);
    params.add(args[nl+2]);
    q.append(comma+"(?,?,?)");
    comma=",";
}      
int nr=insertIntoDB(q, params);

in my DBInterface class I have:

int insertIntoDB(String query, LinkedList <String>params) {
    preparedUPDStmt = connectionSQL.prepareStatement(query);
    int n=1;
    for(String x:params) {
        preparedUPDStmt.setString(n++, x);
    }
    int updates=preparedUPDStmt.executeUpdate();
    return updates;
}
0

if you use jdbcTemplate then:

import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.BatchPreparedStatementSetter;

    public int[] batchInsert(List<Book> books) {

        return this.jdbcTemplate.batchUpdate(
            "insert into books (name, price) values(?,?)",
            new BatchPreparedStatementSetter() {

                public void setValues(PreparedStatement ps, int i) throws SQLException {
                    ps.setString(1, books.get(i).getName());
                    ps.setBigDecimal(2, books.get(i).getPrice());
                }

                public int getBatchSize() {
                    return books.size();
                }

            });
    }

or with more advanced configuration

  import org.springframework.jdbc.core.JdbcTemplate;
  import org.springframework.jdbc.core.ParameterizedPreparedStatementSetter;

    public int[][] batchInsert(List<Book> books, int batchSize) {

        int[][] updateCounts = jdbcTemplate.batchUpdate(
                "insert into books (name, price) values(?,?)",
                books,
                batchSize,
                new ParameterizedPreparedStatementSetter<Book>() {
                    public void setValues(PreparedStatement ps, Book argument) 
                        throws SQLException {
                        ps.setString(1, argument.getName());
                        ps.setBigDecimal(2, argument.getPrice());
                    }
                });
        return updateCounts;

    }

link to source

0

The following is a "Old school" variation of the INSERT ALL method.

instead of MANY sql statements:

insert into MY_TABLE (Field1, FIeld1,....) values (row1.v1,row1.v2...);
insert into MY_TABLE (Field1, FIeld1,....) values (row2.v1,row2.v2...);
...
(many)
...

consider using ONE statement...

insert into MY_TABLE(Field1, FIeld1,....)
Select row1.v1,row1.v2..  from dual
union all
Select row2.v1,row2.v2..  from dual
union all
...

Typically I batch every 50 insert statements into one

Also don't forget to turn off auto commit and manually commit after every 500 statements. If I am batching 50 statements in to one, then I commit every 100 statements.

Where do the numbers 50, 100 and 500 come from... I pulled them out of the air. I found you can optimize the numbers for your particular DB and data, but in the long run optimizing the numbers is normally not worth the effort as using the above numbers is a god enough performance increase.

Sorry, I don't have performance increase value examples at hand.

obviously there are variations to the above method:

  1. putting the unions into a WITH clause (Oracle only)
  2. creating customized data array objects in the DB and setting all the rows as an array into a sql statement using the "select * from TABLE()" function to translate the array into data source for the sql select. (again only works on oracle... and only worth the effort if you bulk load to the same table regularly as you also have to register the oracle type in the jdbc driver.)

The downside is you need to "generate" the "UNION" portion depending on how many rows you are importing. eg., if your batch size is 50 and you have 53 records, the first iteration needs an sql with 50 rows 'union'ed, and the second iteration with 3 rows 'union'ed.

Note: If using the newer INSERT ALL, people report it fails if trying to do more then 5000 rows at a time. There is also a limitation to the number of columns (1000?)...but why have a table that big?

Always try to keep the number of "uncommited" rows to a reasonable amount. (again different db config and hardware determine this) As a rule of thumb keeping less then 5000 tends to be reasonable. Attempting to have tens millions of uncommitted rows is possible (I have done this) ... but does dramatically consume database resources and hence effect performance.

-7

Using PreparedStatements will be MUCH slower than Statements if you have low iterations. To gain a performance benefit from using a PrepareStatement over a statement, you need to be using it in a loop where iterations are at least 50 or higher.

2
  • 8
    No, it won't, ever. A normal Statement (not PrepareStatement) object has to do ALL the same things a PreparedStatement does and is in fact a wrapper around PreparedStatement that actually does the prepared part as well. The difference between the two is that a Statement object silently prepares the statement and validates it every time you execute it, where as a prepared statement only does that once and then can be executed multiple times to process each item in the batch.
    – David
    Commented Oct 12, 2015 at 19:29
  • is this answer valid at all?? Commented Mar 13, 2017 at 2:42

Not the answer you're looking for? Browse other questions tagged or ask your own question.