1

I am currently writing a Java program which loops through a folder of around 4000 XML files.

Using a for loop, it extracts the XML from each file, assigns it to a String 'xmlContent', and uses the PreparedStatement method setString(2,xmlContent) to insert the String into a table stored in my SQL Server.

The column '2' is a column called 'Data' of type XML.

The process works, but it is slow. It inserts about 50 rows into the table every 7 seconds.

Does anyone have any ideas as to how I could speed up this process?

Code:

{ ...declaration, connection etc etc
        PreparedStatement ps = con.prepareStatement("INSERT INTO Table(ID,Data) VALUES(?,?)");

        for (File current : folder.listFiles()){
           if (current.isFile()){
              xmlContent = fileRead(current.getAbsoluteFile());
              ps.setString(1, current.getAbsoluteFile());
              ps.setString(2, xmlContent);
              ps.addBatch();

              if (++count % batchSize == 0){
                    ps.executeBatch();
              }

           }
        }
        ps.executeBatch();   // performs insertion of leftover rows
        ps.close();
}

private static String fileRead(File file){

         StringBuilder xmlContent = new StringBuilder();

         FileReader fr = new FileReader(file);
         BufferedReader br = new BufferedReader(fr);
         String strLine = "";
         br.readLine();      //removes encoding line, don't need it and causes problems
         while ( (strLine = br.readLine() ) != null){
             xmlContent.append(strLine);
         }
         fr.close();

         return xmlContent.toString();
     }

2 Answers 2

2

Just from a little reading and a quick test - it looks like you can get a decent speedup by turning off autoCommit on your connection. All of the batch query tutorials I see recommend it as well. Such as http://www.tutorialspoint.com/jdbc/jdbc-batch-processing.htm

Turn it off - and then drop an explicit commit where you want (at the end of each batch, at the end of the whole function, etc).

 conn.setAutoCommit(false);
 PreparedStatement ps = // ... rest of your code

 // inside your for loop

     if (++count % batchSize == 0) 
     {
           try {
             ps.executeBatch();
             conn.commit();
           }
           catch (SQLException e)
           {
              // .. whatever you want to do
              conn.rollback();
           }
     }
0

Best make the read and write parallel.

Use one thread to read the files and store in a buffer. Use another thread to read from the buffer and execute queries on database.

You can use more than one thread to write to the database in parallel. That should give you even better performance.

I would suggest you follow this MemoryStreamMultiplexer approach where you can read the XML files in one thread and store in a buffer and then use one or more thread to read from the buffer and execute against database.

http://www.codeproject.com/Articles/345105/Memory-Stream-Multiplexer-write-and-read-from-many

It is a C# implementation, but you get the idea.