6

I have a voting dataset like that:

republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
republican,n,y,n,y,y,y,n,n,n,n,n,y,y,y,n,?
democrat,?,y,y,?,y,y,n,n,n,n,y,n,y,y,n,n
democrat,n,y,y,n,?,y,n,n,n,n,y,n,y,n,n,y

but they are both string so I want to change them to integer matrix and make statistic hou_dat = pd.read_csv("house.data", header=None)

for i in range (0, hou_dat.shape[0]):
    for j in range (0, hou_dat.shape[1]):
        if hou_dat[i, j] == "republican":
            hou_dat[i, j] = 2
        if hou_dat[i, j] == "democrat":
            hou_dat[i, j] = 3
        if hou_dat[i, j] == "y":
            hou_dat[i, j] = 1
        if hou_dat[i, j] == "n":
            hou_dat[i, j] = 0
        if hou_dat[i, j] == "?":
            hou_dat[i, j] = -1

hou_sta = hou_dat.apply(pd.value_counts)
print(hou_sta)

however, it shows error, how to solve it?:

Exception has occurred: KeyError
(0, 0)

2 Answers 2

4

IIUC, you need map and stack

map_dict = {'republican' : 2,
           'democrat' : 3,
           'y' : 1,
           'n' : 0,
           '?' : -1}

df1 = df.stack().map(map_dict).unstack()

print(df1)

   0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16
0   2   0   1   0   1   1   1   0   0   0   1  -1   1   1   1   0   1
1   2   0   1   0   1   1   1   0   0   0   0   0   1   1   1   0  -1
2   3  -1   1   1  -1   1   1   0   0   0   0   1   0   1   1   0   0
3   3   0   1   1   0  -1   1   0   0   0   0   1   0   1   0   0   1
6
  • Thanks again, but where do I add path?
    – 4daJKong
    Commented May 21, 2020 at 15:27
  • hou_dat = pd.read_clipboard("house.data",sep=" ", header=None) is not correct
    – 4daJKong
    Commented May 21, 2020 at 15:28
  • @4daJKong you can ignore that, that was only to reproduce your data from above.
    – Umar.H
    Commented May 21, 2020 at 15:28
  • but my data from a dataset, name "house.data", how to import that?
    – 4daJKong
    Commented May 21, 2020 at 15:30
  • hou_dat = pd.read_csv("house.data", header=None) you mean adding pd.read_clipboard directly? under this?
    – 4daJKong
    Commented May 21, 2020 at 15:32
0

If you're dealing with data from csv, it is better to use pandas' methods. In this case, you have replace method to do exactly what you asked for.

hou_dat.replace(to_replace={'republican':2, 'democrat':3, 'y':1, 'n':0, '?':-1}, inplace=True)

You can read more about it in this documentation

Not the answer you're looking for? Browse other questions tagged or ask your own question.