How to import a csv file using python with headers intact, where first column is a non-numerical

Question

This is an elaboration of a previous question, but as I delve deeper into python, I just get more confused as to how python handles csv files.

I have a csv file, and it must stay that way (e.g., cannot convert it to text file). It is the equivalent of a 5 rows by 11 columns array or matrix, or vector.

I have been attempting to read in the csv using various methods I have found here and other places (e.g. python.org) so that it preserves the relationship between columns and rows, where the first row and the first column = non-numerical values. The rest are float values, and contain a mixture of positive and negative floats.

What I wish to do is import the csv and compile it in python so that if I were to reference a column header, it would return its associated values stored in the rows. For example:

>>> workers, constant, age
>>> workers
    w0
    w1
    w2
    w3
    constant
    7.334
    5.235
    3.225
    0
    age
    -1.406
    -4.936
    -1.478
    0

And so forth...

I am looking for techniques for handling this kind of data structure. I am very new to python.

Python doesn't handle csv files by itself, but there are various libraries. You can look at the standard csv module and, if this is not enough, look at pandas. Basically, there is no magic: if you want something, just write the code for it, or find a library. — Sergey Orshanskiy, Commented Jan 27, 2015 at 4:35

gitaarik · Accepted Answer · 2020-05-25 18:47:25Z

170

For Python 3

Remove the rb argument and use either r or don't pass argument (default read mode).

with open( <path-to-file>, 'r' ) as theFile:
    reader = csv.DictReader(theFile)
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'
        print(line)

For Python 2

import csv
with open( <path-to-file>, "rb" ) as theFile:
    reader = csv.DictReader( theFile )
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'

Python has a powerful built-in CSV handler. In fact, most things are already built in to the standard library.

edited May 25, 2020 at 18:47

gitaarik

45.3k12 gold badges99 silver badges109 bronze badges

answered Aug 6, 2010 at 23:49

Katriel

123k19 gold badges137 silver badges170 bronze badges

Always open csv files in binary mode.
– John Machin
Commented Aug 7, 2010 at 0:24
6

@JohnMachin I don't know if that's true anymore. I tried this code (Python 3.4.3) with the .csv I created in a text editor, and I got an error: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?). It worked when I removed the rb argument (default read mode).
– NauticalMile
Commented May 28, 2016 at 18:04
2

@NauticalMile is right: with python 2, @JohnMachin 's advice to open csv files rb applies, but with python 3.5 it causes that _csv.Error: iterator should return strings, not bytes … error.
– Honore Doktorr
Commented Jul 9, 2016 at 12:20
Be aware if you need to preserve the colums order. Dictionary does not preserve it (as expected). Thanks for very concise solution!
– selyunin
Commented Jul 21, 2016 at 9:31

Add a comment |

slhck · Accepted Answer · 2018-10-12 14:18:25Z

127

Python's csv module handles data row-wise, which is the usual way of looking at such data. You seem to want a column-wise approach. Here's one way of doing it.

Assuming your file is named myclone.csv and contains

workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0

this code should give you an idea or two:

>>> import csv
>>> f = open('myclone.csv', 'rb')
>>> reader = csv.reader(f)
>>> headers = next(reader, None)
>>> headers
['workers', 'constant', 'age']
>>> column = {}
>>> for h in headers:
...    column[h] = []
...
>>> column
{'workers': [], 'constant': [], 'age': []}
>>> for row in reader:
...   for h, v in zip(headers, row):
...     column[h].append(v)
...
>>> column
{'workers': ['w0', 'w1', 'w2', 'w3'], 'constant': ['7.334', '5.235', '3.2225', '0'], 'age': ['-1.406', '-4.936', '-1.478', '0']}
>>> column['workers']
['w0', 'w1', 'w2', 'w3']
>>> column['constant']
['7.334', '5.235', '3.2225', '0']
>>> column['age']
['-1.406', '-4.936', '-1.478', '0']
>>>

To get your numeric values into floats, add this

converters = [str.strip] + [float] * (len(headers) - 1)

up front, and do this

for h, v, conv in zip(headers, row, converters):
  column[h].append(conv(v))

for each row instead of the similar two lines above.

edited Oct 12, 2018 at 14:18

slhck

38k32 gold badges155 silver badges210 bronze badges

answered Aug 7, 2010 at 0:15

John Machin

82.6k11 gold badges145 silver badges191 bronze badges

Thanks a lot John, this is very helpful. I had tried some techniques using some of the functions you used in the above example, but was unable to "package" the multiple csv functions appropriately. This will help tremendously. How then would I go about "stacking" these columns to generate a table of sorts? Could I use numpy.hstack (or is it vstack?)
– myClone
Commented Aug 8, 2010 at 18:38
1

I don't understand "stacking". You already have "a table of sorts" whose contents you can access by column['column_name'][row_index]. I don't use numpy; I'd need to read the manual (hint, hint). Perhaps you could ask another question, specifying what you need to do with the table.
– John Machin
Commented Aug 8, 2010 at 21:44
20

Note: with Python 3.5, I got AttributeError: '_csv.reader' object has no attribute 'next'. This was solved by using next(reader, None) instead of reader.next().
– Lindsay Ward
Commented Sep 11, 2016 at 4:24
1

Thanks @LindsayWard, I was moving some code from Python 2.7 and questioning reality >.>
– Joe
Commented Jun 21, 2018 at 16:51

Add a comment |

Ankur · Accepted Answer · 2016-10-08 21:29:35Z

16

You can use pandas library and reference the rows and columns like this:

import pandas as pd

input = pd.read_csv("path_to_file");

#for accessing ith row:
input.iloc[i]

#for accessing column named X
input.X

#for accessing ith row and column named X
input.iloc[i].X

answered Oct 8, 2016 at 21:29

Ankur

5,9631 gold badge16 silver badges14 bronze badges

Add a comment |

David Colwell · Accepted Answer · 2021-05-03 04:52:37Z

I recently had to write this method for quite a large datafile, and i found using list comprehension worked quite well

      import csv
      with open("file.csv",'r') as f:
        reader = csv.reader(f)
        headers = next(reader)
        data = [{h:x for (h,x) in zip(headers,row)} for row in reader]
        #data now contains a list of the rows, with each row containing a dictionary 
        #  in the shape {header: value}. If a row terminates early (e.g. there are 12 columns, 
        #  it only has 11 values) the dictionary will not contain a header value for that row.

Collectives™ on Stack Overflow

How to import a csv file using python with headers intact, where first column is a non-numerical

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
csv
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythoncsv or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
csv
or ask your own question.