1

I have a .csv file with 9273 rows.

I have one column of whole numbers and one column of decimal numbers with values like 0.0001752.

I totalled these columns using Excel's sum function and got totals of 8904671 and 4750.04556.

I then imported this .csv file into a MySQL database using "LOAD DATA INFILE". The database also contained 9273 rows of data, however MySQL SUM function returns column totals of 8603588 and 4432.1793865700.

Confused, I opened the original .csv with LibreOffice and totalled the columns again. That provided column totals of 8603597.8662 (for the whole numbers) and 4432.17938657.

Finally, I thought I'd see what Google Sheets made of the data from the .csv. It totalled the columns as the LibreOffice value if the data was copied from there, and Excel if copied from Excel. Sheets also gave the Excel totals if I imported the plain .csv. However, Sheets cannot add up the columns at all unless I choose 'yes' to the "Convert text to numbers, dates and formulas" option.

As a summary, for the same .csv the column totals for the two columns are:

MySQL:                    8603588,      4432.1793865700

Excel:                    8904671,      4750.04556

LibreOffice:              8603597.8662, 4432.17938657

Google Sheets from Excel: 8904671,      4750.045587

Google Sheets from Libre: 8603597.866,  4432.179387

Google Sheets from .csv:  8904671,      4750.04556

(Google sheets required "convert text to numbers" option)

Which is the correct set of column totals, and why does a sum of the same values result in different totals?

3
  • It's hard to say why you get these particular totals without seeing what you are adding. Can you provide an MCVE? Commented Feb 25, 2020 at 16:16
  • if you export the data from LibreOffice and open the new csv in Excel, what values do you get?
    – sippybear
    Commented Feb 25, 2020 at 16:31
  • Thanks so much @cybernetic.nomad! In trying to produce a MCVE for you, I eventually found an inconsistency at column totals from row 4000 onwards. There are other columns before the integers. The first error looks something like this: "SomeText- "SomeText""," SomeString","35091","37.82739618" Libre reads: "SomeText- "SomeText""," SomeString" ...as being a single column but Excel reads it as two columns. So the correct figure is the higher amounts. Interestingly, 'LOAD DATA INFILE' must read the csv the same way Libre does, and doesn't split the column on the apostrophe.
    – Moggz
    Commented Feb 25, 2020 at 17:23

1 Answer 1

1

I found the issue. There was one thing missing from my description that actually ended up being important - The .csv contained some columns with string values before the two columns with integers.

Most of the .csv rows looked something like this...

  • "Some String","Another String", "84711","1.41128526"

However, I eventually found a row that looked like this...

  • "Some- "String"","Another String","35091","37.82739618"

Excel reads both of the rows above, correctly, as being 4 columns.

However, LibreOffice only reads it as 3 columns, and considers: "Some- "String"","Another String" to be one single column instead of two.

What surprises me, is that neither LibreOffice or MySQLs 'LOAD DATA INFILE' considered the comma as a delimiter for a new column in instances where the first string contained double-quotes, whereas Excel did. This also explains why the column that should have only contained whole numbers ended up being a decimal number in Libre and MySQL, as the value columns had both been moved one column to the left.

1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .