Ignore the first space in CSV

Question

I have a CSV file like this:

Time              Latitude Longitude
2021-09-12 23:13    44.63     -63.56
2021-09-14 23:13    43.78     -62
2021-09-16 23:14    44.83     -54.6

2021-09-12 23:13 is under Time column.

I would like to open it using pandas. But there is a problem with the first column. It contains a space. If I open it using:

import pandas as pd
points = pd.read_csv("test.csv", delim_whitespace=True)

I get

	Time	Latitude	Longitude
2021-09-12	23:13	44.630	-63.560
2021-09-14	23:13	43.780	-62.000
2021-09-16	23:14	44.830	-54.600

But I would like to skip the space in the first column in CSV (2021-09-12 23:13 should be under Time column) like:

	Time	Latitude	Longitude
0	2021-09-12 23:13	44.630	-63.560
1	2021-09-14 23:13	43.780	-62.000
2	2021-09-16 23:14	44.830	-54.600

How can I ignore the first space when using pd.read_csv?

Please do not stick to this csv file. This is a general question to skip (not to consider as a delimiter) the first space(s) in the first column. Because everyone knows that the first space is part of the time value, not a delimiter.

Are the columns separated by spaces or tabs? If the separator is a tab, then instead of delim_whitespace=True, you could use sep=\t. Spaces in column values then wouldn't matter. — jjramsey, Commented Jan 21, 2022 at 13:37
For such a character-sensitive problem, please include the raw text of your delimited file. The Markdown table is pretty but hides really relevant information for solving this problem. — Zach Young, Commented Jan 21, 2022 at 18:51

Serge Ballesta · Accepted Answer · 2022-01-21 14:53:11Z

What you have shown is not a csv file. Full stop. Pandas read_csv is indeed versatile enough to possibly find workaround allowing to process it. But it is actually a fixed width fields file and should be read with pd.read_fwf:

pd.read_fwf(file_name, [(0,16), (16,26), (26, 40)])

directly gives:

               Time  Latitude  Longitude
0  2021-09-12 23:13     44.63     -63.56
1  2021-09-14 23:13     43.78     -62.00
2  2021-09-16 23:14     44.83     -54.60

From your edit, you only want to tell read_csv to consider the first white space as a non delimiter character. I know no simple way to do that. The hard way is to read the file, replace the first space in each and every line with a different character. Then you submit that changed file to read_csv with a custom converter for the first column to change the special character back to a space:

with open('test.csv') as fdin, open('test2.csv', 'w') as fdout):
    fdout.write(next(fdin)    # do not process the header line
    for line in fdin:
        fdout.write(line.replace(' ', '_', 1)

df = pd.read_csv('test2.csv', delim_whitespace=True,
                 converters = {'Time': lambda x: x.replace('_', ' ')})

It gives too:

               Time  Latitude  Longitude
0  2021-09-12 23:13     44.63     -63.56
1  2021-09-14 23:13     43.78     -62.00
2  2021-09-16 23:14     44.83     -54.60

Please do not stick to the csv in the question. What if the spaces between them were different. For example; Time Latitude Longitude and 2021-09-12 23:13 44.63 -63.56, I just want pandas not to consider the first space as a delimiter. — Kadir Şahbaz, Commented Jan 21, 2022 at 13:51
@KadirŞahbaz: I know no easy way for that. For the hard way, please see my edit... — Serge Ballesta, Commented Jan 21, 2022 at 14:53

Corralien · Accepted Answer · 2022-01-21 13:29:02Z

2

Try to fix the column and the index after load the file:

import pandas as pd

points = pd.read_csv('test.csv', delim_whitespace=True)

points = points.assign(Time=pd.to_datetime(df.index + ' ' + df['Time'])) \
               .reset_index(drop=True)

Output:

>>> points
                 Time  Latitude  Longitude
0 2021-09-12 23:13:00     44.63     -63.56
1 2021-09-14 23:13:00     43.78     -62.00
2 2021-09-16 23:14:00     44.83     -54.60

edited Jan 21, 2022 at 13:29

answered Jan 21, 2022 at 13:18

Corralien

120k8 gold badges38 silver badges61 bronze badges

not working, same result.
– Kadir Şahbaz
Commented Jan 21, 2022 at 13:19
Maybe you closed too fast if it does not working
– Corralien
Commented Jan 21, 2022 at 13:22
@KadirŞahbaz. I tried another solution, can you check it please?
– Corralien
Commented Jan 21, 2022 at 13:27
I would like to ignore in pd.read_csv. I know how to edit after loading csv.
– Kadir Şahbaz
Commented Jan 21, 2022 at 13:28
I don't think it's possible if it looks like a delimiter.
– Corralien
Commented Jan 21, 2022 at 13:29

| Show 2 more comments

Patrick Artner · Accepted Answer · 2022-01-21 13:31:27Z

Your data is in 2 different kind of formats:

your headerrow has a single space between 'Latitude' and 'Longitude'.
the "data" rows look to be separated by multiple spaces.

You can either edit your data and add a second space between lat & long or trick it by supplying the column headers separately:

Create file:

with open("test.csv","w") as f:
    f.write("""Time              Latitude Longitude
2021-09-12 23:13    44.63     -63.56
2021-09-14 23:13    43.78     -62
2021-09-16 23:14    44.83     -54.6""")

Parse file:

import pandas as pd

# ignore files headers, supply own, use multiple spaces as seperator
df = pd.read_csv("test.csv", delimiter = "   ", 
                 header=0, names = ["Time","Latitude","Longitude"])

print (df)

Output:

               Time  Latitude  Longitude
0  2021-09-12 23:13     44.63     -63.56
1  2021-09-14 23:13     43.78     -62.00
2  2021-09-16 23:14     44.83     -54.60

Martin Evans · Accepted Answer · 2022-01-24 13:06:04Z

Ideally you should be parsing the first two parts as a datetime. By using a space as a delimiter, it would imply the header has three columns. The space after the date though is being seen as an extra column.

A workaround is to skip the header entirely and supply your own column names. The parse_dates parameter can be used to tell Pandas to parse the first two columns as a single combined datetime object.

For example:

import pandas as pd

points = pd.read_csv("test.csv", delimiter=" ", 
    skipinitialspace=True, skiprows=1, index_col=None, 
    parse_dates=[[0, 1]], names=["Date", "Time", "Latitude", "Longitude"])

print(points)

Should give you the following dataframe:

            Date_Time  Latitude  Longitude
0 2021-09-12 23:13:00     44.63     -63.56
1 2021-09-14 23:13:00     43.78     -62.00
2 2021-09-16 23:14:00     44.83     -54.60

Zach Young · Accepted Answer · 2022-01-21 18:50:05Z

How about pd.read_csv(..., skipinitialspace=True)^1?

skipinitialspacebool, default False

Skip spaces after delimiter.

The Python csv module also has a skip-initial-space option^2 (not sure if Pandas made up its own, or wraps this):

Dialect.skipinitialspace

When True, whitespace immediately following the delimiter is ignored. The default is False.

Even though it explicitly states "following the delimiter", it does the sensible thing and ignores leading whitespace in any column/cell.

Given the following input.csv, which has a leading space in all of columns 1 and 2:

When I run this:

import csv

with open('input.csv', newline='') as f:
    reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
    for row in reader:
        print(row)

I get:

['H1', 'H2']
['A', '1']
['B', '2']
['C', '3']

Even if Pandas doesn't support this, at least you could use this as a preliminary transform and feed into Pandas.

Collectives™ on Stack Overflow

Ignore the first space in CSV

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
python
pandas
csv
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Not the answer you're looking for? Browse other questions tagged pythonpandascsv or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
csv
or ask your own question.