7

I have a CSV file like this:

Time              Latitude Longitude
2021-09-12 23:13    44.63     -63.56
2021-09-14 23:13    43.78     -62
2021-09-16 23:14    44.83     -54.6

2021-09-12 23:13 is under Time column.

I would like to open it using pandas. But there is a problem with the first column. It contains a space. If I open it using:

import pandas as pd
points = pd.read_csv("test.csv", delim_whitespace=True) 

I get

Time Latitude Longitude
2021-09-12 23:13 44.630 -63.560
2021-09-14 23:13 43.780 -62.000
2021-09-16 23:14 44.830 -54.600

But I would like to skip the space in the first column in CSV (2021-09-12 23:13 should be under Time column) like:

Time Latitude Longitude
0 2021-09-12 23:13 44.630 -63.560
1 2021-09-14 23:13 43.780 -62.000
2 2021-09-16 23:14 44.830 -54.600

How can I ignore the first space when using pd.read_csv?

Please do not stick to this csv file. This is a general question to skip (not to consider as a delimiter) the first space(s) in the first column. Because everyone knows that the first space is part of the time value, not a delimiter.

3
  • Are the columns separated by spaces or tabs? If the separator is a tab, then instead of delim_whitespace=True, you could use sep=\t. Spaces in column values then wouldn't matter.
    – jjramsey
    Commented Jan 21, 2022 at 13:37
  • All are spaces. Commented Jan 21, 2022 at 13:37
  • For such a character-sensitive problem, please include the raw text of your delimited file. The Markdown table is pretty but hides really relevant information for solving this problem.
    – Zach Young
    Commented Jan 21, 2022 at 18:51

5 Answers 5

3

What you have shown is not a csv file. Full stop. Pandas read_csv is indeed versatile enough to possibly find workaround allowing to process it. But it is actually a fixed width fields file and should be read with pd.read_fwf:

pd.read_fwf(file_name, [(0,16), (16,26), (26, 40)])

directly gives:

               Time  Latitude  Longitude
0  2021-09-12 23:13     44.63     -63.56
1  2021-09-14 23:13     43.78     -62.00
2  2021-09-16 23:14     44.83     -54.60

From your edit, you only want to tell read_csv to consider the first white space as a non delimiter character. I know no simple way to do that. The hard way is to read the file, replace the first space in each and every line with a different character. Then you submit that changed file to read_csv with a custom converter for the first column to change the special character back to a space:

with open('test.csv') as fdin, open('test2.csv', 'w') as fdout):
    fdout.write(next(fdin)    # do not process the header line
    for line in fdin:
        fdout.write(line.replace(' ', '_', 1)

df = pd.read_csv('test2.csv', delim_whitespace=True,
                 converters = {'Time': lambda x: x.replace('_', ' ')})

It gives too:

               Time  Latitude  Longitude
0  2021-09-12 23:13     44.63     -63.56
1  2021-09-14 23:13     43.78     -62.00
2  2021-09-16 23:14     44.83     -54.60
2
  • Please do not stick to the csv in the question. What if the spaces between them were different. For example; Time Latitude Longitude and 2021-09-12 23:13 44.63 -63.56, I just want pandas not to consider the first space as a delimiter. Commented Jan 21, 2022 at 13:51
  • @KadirŞahbaz: I know no easy way for that. For the hard way, please see my edit... Commented Jan 21, 2022 at 14:53
2

Try to fix the column and the index after load the file:

import pandas as pd

points = pd.read_csv('test.csv', delim_whitespace=True)

points = points.assign(Time=pd.to_datetime(df.index + ' ' + df['Time'])) \
               .reset_index(drop=True)

Output:

>>> points
                 Time  Latitude  Longitude
0 2021-09-12 23:13:00     44.63     -63.56
1 2021-09-14 23:13:00     43.78     -62.00
2 2021-09-16 23:14:00     44.83     -54.60
7
  • not working, same result. Commented Jan 21, 2022 at 13:19
  • Maybe you closed too fast if it does not working
    – Corralien
    Commented Jan 21, 2022 at 13:22
  • @KadirŞahbaz. I tried another solution, can you check it please?
    – Corralien
    Commented Jan 21, 2022 at 13:27
  • I would like to ignore in pd.read_csv. I know how to edit after loading csv. Commented Jan 21, 2022 at 13:28
  • I don't think it's possible if it looks like a delimiter.
    – Corralien
    Commented Jan 21, 2022 at 13:29
2

Your data is in 2 different kind of formats:

  • your headerrow has a single space between 'Latitude' and 'Longitude'.
  • the "data" rows look to be separated by multiple spaces.

You can either edit your data and add a second space between lat & long or trick it by supplying the column headers separately:

Create file:

with open("test.csv","w") as f:
    f.write("""Time              Latitude Longitude
2021-09-12 23:13    44.63     -63.56
2021-09-14 23:13    43.78     -62
2021-09-16 23:14    44.83     -54.6""")

Parse file:

import pandas as pd

# ignore files headers, supply own, use multiple spaces as seperator
df = pd.read_csv("test.csv", delimiter = "   ", 
                 header=0, names = ["Time","Latitude","Longitude"])

print (df)

Output:

               Time  Latitude  Longitude
0  2021-09-12 23:13     44.63     -63.56
1  2021-09-14 23:13     43.78     -62.00
2  2021-09-16 23:14     44.83     -54.60
1

Ideally you should be parsing the first two parts as a datetime. By using a space as a delimiter, it would imply the header has three columns. The space after the date though is being seen as an extra column.

A workaround is to skip the header entirely and supply your own column names. The parse_dates parameter can be used to tell Pandas to parse the first two columns as a single combined datetime object.

For example:

import pandas as pd

points = pd.read_csv("test.csv", delimiter=" ", 
    skipinitialspace=True, skiprows=1, index_col=None, 
    parse_dates=[[0, 1]], names=["Date", "Time", "Latitude", "Longitude"])

print(points)

Should give you the following dataframe:

            Date_Time  Latitude  Longitude
0 2021-09-12 23:13:00     44.63     -63.56
1 2021-09-14 23:13:00     43.78     -62.00
2 2021-09-16 23:14:00     44.83     -54.60
0

How about pd.read_csv(..., skipinitialspace=True)^1?

skipinitialspacebool, default False

Skip spaces after delimiter.

The Python csv module also has a skip-initial-space option^2 (not sure if Pandas made up its own, or wraps this):

Dialect.skipinitialspace

When True, whitespace immediately following the delimiter is ignored. The default is False.

Even though it explicitly states "following the delimiter", it does the sensible thing and ignores leading whitespace in any column/cell.

Given the following input.csv, which has a leading space in all of columns 1 and 2:

 H1  H2
 A  1
 B  2
 C  3

When I run this:

import csv

with open('input.csv', newline='') as f:
    reader = csv.reader(f, delimiter=' ', skipinitialspace=True)
    for row in reader:
        print(row)

I get:

['H1', 'H2']
['A', '1']
['B', '2']
['C', '3']

Even if Pandas doesn't support this, at least you could use this as a preliminary transform and feed into Pandas.

Not the answer you're looking for? Browse other questions tagged or ask your own question.