0

Here's a CSV (UTF-8 BOM encoding), with some values delimited with quote marks:

AAAAA,BBBBB,CCCCC,DDDDD
"a2",b2,c2,d2
a3,"b3",c3,d3
a4,b4,"c4","d4"
a5,"this, works",c5,d5
"oh, no",b6,c6,d6

CSV data in Notepad++

When I open this in Excel with the Windows language set to Danish, Excel expects CSVs to be semicolon-separated, so it opens the file as text. But why is it stripping out the quote-marks from the first column values (around "a2" and "oh, no")?

enter image description here

This prevents "Text to columns" working as the comma in "oh, no" is no longer escaped and the remaining data is pushed over one column.

Giving the file a .txt extension and opening it that way produces the same result.

Can anyone explain what's happening here and ideally how to prevent it?


edit: a simpler example is this text:

"hello" world
hello "world"
"hello world"

Excel will display this as:

hello world
hello "world"
hello world
2
  • 1
    Import rather than Open the file use either the legacy import wizard or Power Query (Get&Transform=>from Text/CSV). And if you have users for whom this is difficult, you can always use a button to trigger a macro that brings up the select file dialog, and the rest is automated. Commented Apr 4 at 19:47
  • Yep, we tried that workaround, but unfortunately this is beyond our users... we're looking to get the data fixed upstream so the first column will not contain potentially-comma-bearing text data. But it looks like either no-one's encountered the "Excel inexplicably removes quotemarks from text" issue before or there's just no solution within Excel. I say we take off and nuke the whole app from orbit. It's the only way to be sure.
    – Jack Deeth
    Commented Apr 5 at 0:22

3 Answers 3

0

Can anyone explain what's happening here and ideally how to prevent it?

That's two questions :-)

What's happening: like you said, your regional settings tell Excel to use semicolons as delimiter for CSV. As there are no semicolons, every line in the file is assumed to be a single value. Excel uses double quotes for literal text, so any line that starts with " is assumed to be quoted, and it will remove the quotes.

But why is it stripping out the quote-marks from the first column values

Actually, it's not the first, it's the only column, since there are no semicolons.

How to resolve it: open a new Excel sheet, go to Data/From text/SCV, and select your file. Here you can define the comma as delimiter, and import the file the way it is meant to.

3
  • Thanks, I knew the "From text/CSV" process (which overcomplicates things for the user as it's not opening the .csv, it's treating it as a dynamic data source) - I think the simpler workaround is "install LibreOffice Calc, open the CSV there, it will explicitly ask for separator and delimiter characters instead of making numerous opaque assumptions, then save as .XLSX"
    – Jack Deeth
    Commented Apr 4 at 15:47
  • Sure, that will work. Or perhaps easier: open it in Wordpad or similar, and replace all commas by semicolons...
    – Paul
    Commented Apr 4 at 16:03
  • ha, yes, except that would replace the escaped commas within fields that need to be retained e.g. "oh, no"! In the absence of a solution (preventing Excel from silently, automatically and inconsistently mutilating the data it loads), I think the better workaround is to open the CSV in LibreOffice and then save as .XLSX
    – Jack Deeth
    Commented Apr 4 at 18:19
0

Here's a neat little trick that is very handy when working with CSV files created elsewhere and you want to seamlessly import them into excel.

Open the .csv file with notepad (or notepad++), like you already did so you can edit its content.

Now, At the very first row, press enter to insert a blank row.

Now type in the following:

sep=,

sep as in separator, and the , is the separator you use. Ignore the use or lack of "", excell will understand. The , can of course also be ; if the csv uses that.

Save the file back either to the same file, or as a new one.

If you now open the new file, excel opens it correctly.

Your example file would look like this:

sep=,
AAAAA,BBBBB,CCCCC,DDDDD
"a2",b2,c2,d2
a3,"b3",c3,d3
a4,b4,"c4","d4"
a5,"this, works",c5,d5
"oh, no",b6,c6,d6

Also, if you now distribute the new .csv file to others, their excel will also automatically open it correctly. You can also choose to open it yourself, and then save as .xlsx file. That way, it is also compatible with other editors, such as Open Office Calc, libre office, etc...

It is technically also possible to create a script that automatically inserts sep=, to the start, so you can send that script to people who have trouble opening .csv files.

2
  • Thanks - I'd tried this method (and offered it as a solution in fact) but unfortunately, Excel then ignores the BOM and will misread any non-ASCII UTF-8 characters in the file - which a Danish user is certain to have!
    – Jack Deeth
    Commented Apr 4 at 17:39
  • That's weird, but can be solved by converting it to UTF-8 non BOM in Notepad++
    – LPChip
    Commented Apr 4 at 19:46
0

Not a solution but another workaround: open the file with LibreOffice Calc, which doesn't make assumptions about CSVs but will prompt the user to designate the separator and delimiter characters. The data can then be saved as .xlsx and opened in Excel (if the downstream work can't be done in Calc).

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .