Writing a pandas DataFrame to CSV file

Question

I have a dataframe in pandas which I would like to write to a CSV file.

I am doing this using:

df.to_csv('out.csv')

And getting the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)

Is there any way to get around this easily (i.e. I have unicode characters in my data frame)?
And is there a way to write to a tab delimited file instead of a CSV using e.g. a 'to-tab' method (that I don't think exists)?

Stephen · Accepted Answer · 2024-07-09 20:40:43Z

1495

To delimit by a tab you can use the sep argument of to_csv:

df.to_csv(file_name, sep='\t')

To use a specific encoding (e.g. 'utf-8') use the encoding argument:

df.to_csv(file_name, sep='\t', encoding='utf-8')

In many cases you will want to remove the index and add a header:

df.to_csv(file_name, sep='\t', encoding='utf-8', index=False, header=True)

edited Jul 9 at 20:40

Stephen

8,75913 gold badges62 silver badges107 bronze badges

answered Jun 4, 2013 at 16:52

Andy Hayden

371k108 gold badges633 silver badges538 bronze badges

167

I would add index=False to drop the index.
– Medhat
Commented Jul 3, 2019 at 20:48
@Medhat, edited to add that, plus the header option (although it is also mentioned in other answers)
– Stephen
Commented Jul 9 at 20:41

Add a comment |

cs95 · Accepted Answer · 2019-04-07 22:10:48Z

397

When you are storing a DataFrame object into a csv file using the to_csv method, you probably wont be needing to store the preceding indices of each row of the DataFrame object.

You can avoid that by passing a False boolean value to index parameter.

Somewhat like:

df.to_csv(file_name, encoding='utf-8', index=False)

So if your DataFrame object is something like:

  Color  Number
0   red     22
1  blue     10

The csv file will store:

Color,Number
red,22
blue,10

instead of (the case when the default value True was passed)

,Color,Number
0,red,22
1,blue,10

edited Apr 7, 2019 at 22:10

cs95

397k102 gold badges731 silver badges783 bronze badges

answered Jul 17, 2017 at 10:27

Sayan Sil

6,0753 gold badges19 silver badges32 bronze badges

Add a comment |

Community · Accepted Answer · 2020-06-20 09:12:55Z

To write a pandas DataFrame to a CSV file, you will need DataFrame.to_csv. This function offers many arguments with reasonable defaults that you will more often than not need to override to suit your specific use case. For example, you might want to use a different separator, change the datetime format, or drop the index when writing. to_csv has arguments you can pass to address these requirements.

Here's a table listing some common scenarios of writing to CSV files and the corresponding arguments you can use for them.

Write to CSV ma dude

Footnotes

The default separator is assumed to be a comma (','). Don't change this unless you know you need to.

By default, the index of df is written as the first column. If your DataFrame does not have an index (IOW, the df.index is the default RangeIndex), then you will want to set index=False when writing. To explain this in a different way, if your data DOES have an index, you can (and should) use index=True or just leave it out completely (as the default is True).

It would be wise to set this parameter if you are writing string data so that other applications know how to read your data. This will also avoid any potential UnicodeEncodeErrors you might encounter while saving.

Compression is recommended if you are writing large DataFrames (>100K rows) to disk as it will result in much smaller output files. OTOH, it will mean the write time will increase (and consequently, the read time since the file will need to be decompressed).

Singh · Accepted Answer · 2020-10-15 04:26:38Z

40

Example of export in file with full path on Windows and in case your file has headers:

df.to_csv (r'C:\Users\John\Desktop\export_dataframe.csv', index = None, header=True)

For example, if you want to store the file in same directory where your script is, with utf-8 encoding and tab as separator:

df.to_csv(r'./export/dftocsv.csv', sep='\t', encoding='utf-8', header='true')

edited Oct 15, 2020 at 4:26

Singh

5875 silver badges16 bronze badges

answered Aug 8, 2019 at 12:23

Hrvoje

14.6k10 gold badges98 silver badges118 bronze badges

Add a comment |

Glen Thompson · Accepted Answer · 2017-12-01 17:38:37Z

Something else you can try if you are having issues encoding to 'utf-8' and want to go cell by cell you could try the following.

Python 2

(Where "df" is your DataFrame object.)

for column in df.columns:
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = unicode(x.encode('utf-8','ignore'),errors ='ignore') if type(x) == unicode else unicode(str(x),errors='ignore')
            df.set_value(idx,column,x)
        except Exception:
            print 'encoding error: {0} {1}'.format(idx,column)
            df.set_value(idx,column,'')
            continue

Then try:

df.to_csv(file_name)

You can check the encoding of the columns by:

for column in df.columns:
    print '{0} {1}'.format(str(type(df[column][0])),str(column))

Warning: errors='ignore' will just omit the character e.g.

IN: unicode('Regenexx\xae',errors='ignore')
OUT: u'Regenexx'

Python 3

for column in df.columns:
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = x if type(x) == str else str(x).encode('utf-8','ignore').decode('utf-8','ignore')
            df.set_value(idx,column,x)
        except Exception:
            print('encoding error: {0} {1}'.format(idx,column))
            df.set_value(idx,column,'')
            continue

Marc Compte · Accepted Answer · 2022-02-10 10:36:02Z

21

If above solution not working for anyone or the CSV is getting messed up, just remove sep='\t' from the line like this:

df.to_csv(file_name, encoding='utf-8')

edited Feb 10, 2022 at 10:36

Marc Compte

4,7792 gold badges17 silver badges22 bronze badges

answered Apr 27, 2021 at 9:37

Shahriar Kabir Khan

6778 silver badges17 bronze badges

Add a comment |

Tadhg McDonald-Jensen · Accepted Answer · 2016-05-19 13:15:09Z

17

Sometimes you face these problems if you specify UTF-8 encoding also. I recommend you to specify encoding while reading file and same encoding while writing to file. This might solve your problem.

edited May 19, 2016 at 13:15

Tadhg McDonald-Jensen

21.3k5 gold badges38 silver badges62 bronze badges

answered May 19, 2016 at 13:02

Harsha Komarraju

1861 silver badge4 bronze badges

Add a comment |

nucsit026 · Accepted Answer · 2020-01-19 19:42:26Z

11

it could be not the answer for this case, but as I had the same error-message with .to_csvI tried .toCSV('name.csv') and the error-message was different ("SparseDataFrame' object has no attribute 'toCSV'). So the problem was solved by turning dataframe to dense dataframe

df.to_dense().to_csv("submission.csv", index = False, sep=',', encoding='utf-8')

edited Jan 19, 2020 at 19:42

nucsit026

7027 silver badges16 bronze badges

answered Jan 26, 2018 at 15:35

Yury Wallet

1,5841 gold badge15 silver badges24 bronze badges

1

You got the error in the second one as it looks like you used .toCSV and not .to_csv. You forgot the underscore
– Kyle C
Commented Feb 4, 2019 at 17:51

Add a comment |

Joe Ferndz · Accepted Answer · 2023-05-22 05:48:45Z

4

I would avoid using the '\t' separate and would create issues when reading the dataset again.

df.to_csv(file_name, encoding='utf-8')

edited May 22, 2023 at 5:48

Joe Ferndz

8,5112 gold badges14 silver badges34 bronze badges

answered Feb 9, 2022 at 17:34

Ruwindhu Chandraratne

514 bronze badges

Add a comment |

cottontail · Accepted Answer · 2023-05-10 01:47:59Z

1. `errors=` is sometimes useful

If a file has to have a certain encoding but the existing dataframe has characters that cannot be represented, errors= can be used to "coerce" the data to be saved anyway at the cost of losing information. All possible values that can be passed as the errors= argument to the open() function in Python can be passed here.

For example, the below code saves a csv with ascii encoding where the Japanese characters are replaced with a ?.

df = pd.DataFrame({'A': ['Shohei Ohtani は一生に一度の選手だ。']})
df.to_csv('data1.csv', encoding='ascii', errors='replace', index=False)

print(pd.read_csv('data1.csv'))

                           A
0  Shohei Ohtani ???????????

2. `float_format=` is sometimes useful

You can format float dtypes using float_format= and doing so saves a lot of memory sometimes at the cost of losing precision. For example,

df = pd.DataFrame({'A': [*range(1,9,3)]*1000})/3
df.to_csv('data1.csv', index=False)                       # 61,440 bytes on disk
df.to_csv('data2.csv', index=False, float_format='%.2f')  # 20,480 bytes on disk

3. Save a compressed csv

Since pandas 1.0.0, you can pass a dict to compression that specifies compression method and file name inside the archive. The below code creates a zip file named compressed_data.zip which has a single file in it named data.csv.

df.to_csv('compressed_data.zip', index=False, compression={'method': 'zip', 'archive_name': 'data.csv'})
# read the archived file as a csv
pd.read_csv('compressed_data.zip')

You can even add to an existing archive; simply pass mode='a'.

df.to_csv('compressed_data.zip', compression={'method': 'zip', 'archive_name': 'data_new.csv'}, mode='a')

Collectives™ on Stack Overflow

Writing a pandas DataFrame to CSV file

10 Answers 10

1. `errors=` is sometimes useful

2. `float_format=` is sometimes useful

3. Save a compressed csv

Not the answer you're looking for? Browse other questions tagged
python
csv
pandas
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

1. errors= is sometimes useful

2. float_format= is sometimes useful

3. Save a compressed csv

Not the answer you're looking for? Browse other questions tagged pythoncsvpandasdataframe or ask your own question.

Linked

Related

1. `errors=` is sometimes useful

2. `float_format=` is sometimes useful

Not the answer you're looking for? Browse other questions tagged
python
csv
pandas
dataframe
or ask your own question.