96

This is an elaboration of a previous question, but as I delve deeper into python, I just get more confused as to how python handles csv files.

I have a csv file, and it must stay that way (e.g., cannot convert it to text file). It is the equivalent of a 5 rows by 11 columns array or matrix, or vector.

I have been attempting to read in the csv using various methods I have found here and other places (e.g. python.org) so that it preserves the relationship between columns and rows, where the first row and the first column = non-numerical values. The rest are float values, and contain a mixture of positive and negative floats.

What I wish to do is import the csv and compile it in python so that if I were to reference a column header, it would return its associated values stored in the rows. For example:

>>> workers, constant, age
>>> workers
    w0
    w1
    w2
    w3
    constant
    7.334
    5.235
    3.225
    0
    age
    -1.406
    -4.936
    -1.478
    0

And so forth...

I am looking for techniques for handling this kind of data structure. I am very new to python.

1
  • 1
    Python doesn't handle csv files by itself, but there are various libraries. You can look at the standard csv module and, if this is not enough, look at pandas. Basically, there is no magic: if you want something, just write the code for it, or find a library. Commented Jan 27, 2015 at 4:35

4 Answers 4

170

For Python 3

Remove the rb argument and use either r or don't pass argument (default read mode).

with open( <path-to-file>, 'r' ) as theFile:
    reader = csv.DictReader(theFile)
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'
        print(line)

For Python 2

import csv
with open( <path-to-file>, "rb" ) as theFile:
    reader = csv.DictReader( theFile )
    for line in reader:
        # line is { 'workers': 'w0', 'constant': 7.334, 'age': -1.406, ... }
        # e.g. print( line[ 'workers' ] ) yields 'w0'

Python has a powerful built-in CSV handler. In fact, most things are already built in to the standard library.

4
  • Always open csv files in binary mode. Commented Aug 7, 2010 at 0:24
  • 6
    @JohnMachin I don't know if that's true anymore. I tried this code (Python 3.4.3) with the .csv I created in a text editor, and I got an error: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?). It worked when I removed the rb argument (default read mode). Commented May 28, 2016 at 18:04
  • 2
    @NauticalMile is right: with python 2, @JohnMachin 's advice to open csv files rb applies, but with python 3.5 it causes that _csv.Error: iterator should return strings, not bytes … error. Commented Jul 9, 2016 at 12:20
  • Be aware if you need to preserve the colums order. Dictionary does not preserve it (as expected). Thanks for very concise solution!
    – selyunin
    Commented Jul 21, 2016 at 9:31
127

Python's csv module handles data row-wise, which is the usual way of looking at such data. You seem to want a column-wise approach. Here's one way of doing it.

Assuming your file is named myclone.csv and contains

workers,constant,age
w0,7.334,-1.406
w1,5.235,-4.936
w2,3.2225,-1.478
w3,0,0

this code should give you an idea or two:

>>> import csv
>>> f = open('myclone.csv', 'rb')
>>> reader = csv.reader(f)
>>> headers = next(reader, None)
>>> headers
['workers', 'constant', 'age']
>>> column = {}
>>> for h in headers:
...    column[h] = []
...
>>> column
{'workers': [], 'constant': [], 'age': []}
>>> for row in reader:
...   for h, v in zip(headers, row):
...     column[h].append(v)
...
>>> column
{'workers': ['w0', 'w1', 'w2', 'w3'], 'constant': ['7.334', '5.235', '3.2225', '0'], 'age': ['-1.406', '-4.936', '-1.478', '0']}
>>> column['workers']
['w0', 'w1', 'w2', 'w3']
>>> column['constant']
['7.334', '5.235', '3.2225', '0']
>>> column['age']
['-1.406', '-4.936', '-1.478', '0']
>>>

To get your numeric values into floats, add this

converters = [str.strip] + [float] * (len(headers) - 1)

up front, and do this

for h, v, conv in zip(headers, row, converters):
  column[h].append(conv(v))

for each row instead of the similar two lines above.

4
  • Thanks a lot John, this is very helpful. I had tried some techniques using some of the functions you used in the above example, but was unable to "package" the multiple csv functions appropriately. This will help tremendously. How then would I go about "stacking" these columns to generate a table of sorts? Could I use numpy.hstack (or is it vstack?)
    – myClone
    Commented Aug 8, 2010 at 18:38
  • 1
    I don't understand "stacking". You already have "a table of sorts" whose contents you can access by column['column_name'][row_index]. I don't use numpy; I'd need to read the manual (hint, hint). Perhaps you could ask another question, specifying what you need to do with the table. Commented Aug 8, 2010 at 21:44
  • 20
    Note: with Python 3.5, I got AttributeError: '_csv.reader' object has no attribute 'next'. This was solved by using next(reader, None) instead of reader.next(). Commented Sep 11, 2016 at 4:24
  • 1
    Thanks @LindsayWard, I was moving some code from Python 2.7 and questioning reality >.>
    – Joe
    Commented Jun 21, 2018 at 16:51
16

You can use pandas library and reference the rows and columns like this:

import pandas as pd

input = pd.read_csv("path_to_file");

#for accessing ith row:
input.iloc[i]

#for accessing column named X
input.X

#for accessing ith row and column named X
input.iloc[i].X
9

I recently had to write this method for quite a large datafile, and i found using list comprehension worked quite well

      import csv
      with open("file.csv",'r') as f:
        reader = csv.reader(f)
        headers = next(reader)
        data = [{h:x for (h,x) in zip(headers,row)} for row in reader]
        #data now contains a list of the rows, with each row containing a dictionary 
        #  in the shape {header: value}. If a row terminates early (e.g. there are 12 columns, 
        #  it only has 11 values) the dictionary will not contain a header value for that row.

Not the answer you're looking for? Browse other questions tagged or ask your own question.