numpy doesn't recognize data types in conversion

Question

I would be grateful if you could help me with a solution you gave a while back in the link below: Converting a list of ints, tuples into an numpy array

as you may recall you explained a method of converting a tuple to a numpy array. I'm working on a project of a data mining nature and I found out that the most fastest way to collect the data is by using tuples but for more then just recording input I need a numpy array. so I looked up your solution and in kinda worked - the problem is with data types. I have a tuple that looks like this :

t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

and when I try to modify your code like so

A = np.array([tuple(i) for i in t1],dtype=[('ReportTime',datetime.datetime.__class__),('activity',str.__class__)])

the numpy doesn't recognize the data types. am I putting the wrong data types? thank you for your time

ericmjl · Accepted Answer · 2013-12-30 20:31:06Z

Since you're working on a project of a datamining nature, have you considered using Pandas instead?

Here's an example of how I can convert a list of tuples into a Pandas dataframe. I've highlighted a few common newbie errors I made when I first started out with Pandas, to give you an idea of what you can do and cannot do.

In [1]: import pandas as pd

In [2]: data = [(1, 2), (1, 5), (2, 3), (2, 2)]

In [3]: pd.datafr                         

In [3]: pd.DataFrame(data)
Out[3]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [4]: pd.columns[0] = 'column 1'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-c313e6b0cb87> in <module>()
----> 1 pd.columns[0] = 'column 1'

AttributeError: 'module' object has no attribute 'columns'

In [5]: df = pd.DataFrame(data)

In [6]: df
Out[6]: 
   0  1
0  1  2
1  1  5
2  2  3
3  2  2

In [7]: df.columns
Out[7]: Int64Index([0, 1], dtype=int64)

In [8]: df.columns[1] = "column 2"
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-8-76ee806aec72> in <module>()
----> 1 df.columns[1] = "column 2"

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.12.0-py2.7-macosx-10.6-intel.egg/pandas/core/index.pyc in __setitem__(self, key, value)
    328 
    329     def __setitem__(self, key, value):
--> 330         raise Exception(str(self.__class__) + ' object is immutable')
    331 
    332     def __getitem__(self, key):

Exception: <class 'pandas.core.index.Int64Index'> object is immutable

In [9]: df.columns = ["column 1", "column 2"]

In [10]: df
Out[10]: 
   column 1  column 2
0         1         2
1         1         5
2         2         3
3         2         2

In [11]: exit()

Specifically with your example:

In [1]: import pandas as pd

In [3]: import datetime

In [4]: t1=[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],[datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],[datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [5]: t1
Out[5]: 
[[datetime.datetime(2013, 10, 1, 20, 54, 51), 'last'],
 [datetime.datetime(2013, 8, 1, 20, 54, 51), 'First'],
 [datetime.datetime(2013, 9, 2, 20, 54, 51), 'second']]

In [6]: df = pd.DataFrame(t1)

In [7]: df
Out[7]: 
                    0       1
0 2013-10-01 20:54:51    last
1 2013-08-01 20:54:51   First
2 2013-09-02 20:54:51  second

it moves later to signal processing and to working with scipy toolset. does Pandas play nice with scipy? — omryjs, Commented Dec 30, 2013 at 20:37
Yep, absolutely. Check it out: pandas.pydata.org For reference, I'm doing biological science analysis, and I have been using Pandas in IPython HTML notebooks. BioPython, NetworkX etc. all come into play. — ericmjl, Commented Dec 30, 2013 at 21:22
Because I deal with biological sequences and its metadata, I do a lot of saving CSV files in order to preserve the intermediate steps of my analysis work. Pandas plays extremely well with CSV files, for example. Also, because Pandas dataframes are essentially built on numpy arrays, you can always convert a dataframe to a numpy array by using df.as_matrix(). — ericmjl, Commented Dec 30, 2013 at 21:25

Iguananaut · Accepted Answer · 2013-12-30 20:24:01Z

1

Don't use .__class__? If you're unsure, just look at what that actually does:

>>> import datetime
>>> datetime.datetime.__class__
<class 'type'>
>>> str.__class__
<class 'type'>

datetime.datetime and str are already classes, essentially, that you can pass to Numpy for it to determine the appropriate dtype for that class (if in fact it does have a dtype associated with those classes, which should work for datetime.datetime and for str).

str.__class__ on the other hand, is the class of the class str (Python classes are objects too). The class of most classes is type unless it was defined with a custom metaclass.

answered Dec 30, 2013 at 20:24

Iguananaut

22.8k6 gold badges52 silver badges63 bronze badges

2

Also, as a side note, on the rare occasions when you do want to know the type of the str class, it's usually clearer to write type(str) rather than str.__class__.
– abarnert
Commented Dec 30, 2013 at 20:34
I see my mistake. I moved on and tried 'np.asarray(sortedHer)' and it did the conversion without any problems. Thanks
– omryjs
Commented Dec 30, 2013 at 20:36

Add a comment |

Collectives™ on Stack Overflow

numpy doesn't recognize data types in conversion

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python
arrays
numpy
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged pythonarraysnumpy or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
arrays
numpy
or ask your own question.