1

I would be grateful if you could help me with a solution you gave a while back in the link below: Converting a list of ints, tuples into an numpy array

as you may recall you explained a method of converting a tuple to a numpy array. I'm working on a project of a data mining nature and I found out that the most fastest way to collect the data is by using tuples but for more then just recording input I need a numpy array. so I looked up your solution and in kinda worked - the problem is with data types. I have a tuple that looks like this :

t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

and when I try to modify your code like so

A = np.array([tuple(i) for i in t1],dtype=[('ReportTime',datetime.datetime.__class__),('activity',str.__class__)])

the numpy doesn't recognize the data types. am I putting the wrong data types? thank you for your time

2 Answers 2

3

Since you're working on a project of a datamining nature, have you considered using Pandas instead?

Here's an example of how I can convert a list of tuples into a Pandas dataframe. I've highlighted a few common newbie errors I made when I first started out with Pandas, to give you an idea of what you can do and cannot do.

In [1]: import pandas as pd

In [2]: data = [(1, 2), (1, 5), (2, 3), (2, 2)]

In [3]: pd.datafr                         

In [3]: pd.DataFrame(data)
Out[3]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [4]: pd.columns[0] = 'column 1'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-c313e6b0cb87> in <module>()
----> 1 pd.columns[0] = 'column 1'

AttributeError: 'module' object has no attribute 'columns'

In [5]: df = pd.DataFrame(data)

In [6]: df
Out[6]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [7]: df.columns
Out[7]: Int64Index([0, 1], dtype=int64)

In [8]: df.columns[1] = "column 2"
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-8-76ee806aec72> in <module>()
----> 1 df.columns[1] = "column 2"

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.12.0-py2.7-macosx-10.6-intel.egg/pandas/core/index.pyc in __setitem__(self, key, value)
    328 
    329     def __setitem__(self, key, value):
--> 330         raise Exception(str(self.__class__) + ' object is immutable')
    331 
    332     def __getitem__(self, key):

Exception: <class 'pandas.core.index.Int64Index'> object is immutable

In [9]: df.columns = ["column 1", "column 2"]

In [10]: df
Out[10]: 
   column 1  column 2
0         1         2
1         1         5
2         2         3
3         2         2

In [11]: exit()

Specifically with your example:

In [1]: import pandas as pd

In [3]: import datetime

In [4]: t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [5]: t1
Out[5]: 
[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],
 [datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],
 [datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [6]: df = pd.DataFrame(t1)

In [7]: df
Out[7]: 
                    0       1
0 2013-10-01 20:54:51    last
1 2013-08-01 20:54:51   First
2 2013-09-02 20:54:51  second
3
  • it moves later to signal processing and to working with scipy toolset. does Pandas play nice with scipy?
    – omryjs
    Commented Dec 30, 2013 at 20:37
  • Yep, absolutely. Check it out: pandas.pydata.org For reference, I'm doing biological science analysis, and I have been using Pandas in IPython HTML notebooks. BioPython, NetworkX etc. all come into play.
    – ericmjl
    Commented Dec 30, 2013 at 21:22
  • Because I deal with biological sequences and its metadata, I do a lot of saving CSV files in order to preserve the intermediate steps of my analysis work. Pandas plays extremely well with CSV files, for example. Also, because Pandas dataframes are essentially built on numpy arrays, you can always convert a dataframe to a numpy array by using df.as_matrix().
    – ericmjl
    Commented Dec 30, 2013 at 21:25
1

Don't use .__class__? If you're unsure, just look at what that actually does:

>>> import datetime
>>> datetime.datetime.__class__
<class 'type'>
>>> str.__class__
<class 'type'>

datetime.datetime and str are already classes, essentially, that you can pass to Numpy for it to determine the appropriate dtype for that class (if in fact it does have a dtype associated with those classes, which should work for datetime.datetime and for str).

str.__class__ on the other hand, is the class of the class str (Python classes are objects too). The class of most classes is type unless it was defined with a custom metaclass.

2
  • 2
    Also, as a side note, on the rare occasions when you do want to know the type of the str class, it's usually clearer to write type(str) rather than str.__class__.
    – abarnert
    Commented Dec 30, 2013 at 20:34
  • I see my mistake. I moved on and tried 'np.asarray(sortedHer)' and it did the conversion without any problems. Thanks
    – omryjs
    Commented Dec 30, 2013 at 20:36

Not the answer you're looking for? Browse other questions tagged or ask your own question.