Drop rows with all zeros in pandas data frame

Question

I can use pandas dropna() functionality to remove rows with some or all columns set as NA's. Is there an equivalent function for dropping rows with all columns having value 0?

P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0

In this example, we would like to drop the first 4 rows from the data frame.

thanks!

Just to clarify, this is two questions. One, to drop columns with all values as 0. But also, for a function equivalent to dropna() which would drop columns with any value as 0. — alchemy, Commented Apr 22, 2020 at 17:54

8one6 · Accepted Answer · 2014-03-26 03:04:11Z

226

One-liner. No transpose needed:

df.loc[~(df==0).all(axis=1)]

And for those who like symmetry, this also works...

df.loc[(df!=0).any(axis=1)]

edited Mar 26, 2014 at 3:04

answered Mar 26, 2014 at 2:07

8one6

13.5k13 gold badges64 silver badges85 bronze badges

4

For brevity (and, in my opinion, clarity of purpose) combine this and Akavall's comment: df.loc[(df != 0).any(1)]. Teamwork!
– Dan Allan
Commented Mar 26, 2014 at 3:00
1

+1, 30% faster that transpose -- 491 to 614 microsec, and I like the axis=1 for being explicit; more pythonic in my opinion
– gt6989b
Commented Jun 27, 2016 at 21:41
2

Some mention should be made of difference between using .all and .any since the original question mentioned equivalence of dropna. If you want to drop all rows with any column containing a zero, you have to reverse the .all and .any in above answer. Took me awhile to realize this as I was looking for that functionality.
– Zak Keirn
Commented Mar 6, 2018 at 18:21
1

This does not work for me, but returns me the exact same df
– Robvh
Commented Jul 17, 2019 at 12:31
Is there an 'inplace' version of this? I see that to drop rows in a df as the OP requested, this would need to be df = df.loc[(df!=0).all(axis=1)] and df = df.loc[(df!=0).any(axis=1)] to drop rows with any zeros as would be the actual equivalent to dropna().
– alchemy
Commented Apr 22, 2020 at 17:51

Add a comment |

U2EF1 · Accepted Answer · 2014-03-26 03:03:54Z

147

It turns out this can be nicely expressed in a vectorized fashion:

> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
   a  b
1  0  1
2  1  0
3  1  1

edited Mar 26, 2014 at 3:03

answered Mar 26, 2014 at 1:59

U2EF1

13.2k3 gold badges36 silver badges37 bronze badges

7

Nice, but I think you can avoid negation with df = df[(df.T != 0).any()]
– Akavall
Commented Mar 26, 2014 at 2:23
1

@Akavall Much better!
– U2EF1
Commented Mar 26, 2014 at 3:04
3

Just a note: OP wanted to drop rows with all columns having value 0, but one can infer all method.
– paulochf
Commented Apr 25, 2016 at 20:02
1

All of these answers explain how can we drop rows with all zeros, However, I wanted to drop rows, with 0 in the first column. With the help of all discussion and answers in this post, I did this by doing df.loc[df.iloc[:, 0] != 0]. Just wanted to share because this problem is related to this question!!
– hemanta
Commented Feb 14, 2019 at 4:47
4

The transpose is not necessary, any() can take an axis as a parameter. So this works: df = df[df.any(axis=1)]
– Rahul Jha
Commented Jul 17, 2019 at 17:22

| Show 6 more comments

Ikbel · Accepted Answer · 2019-03-08 15:59:44Z

58

I think this solution is the shortest :

df= df[df['ColName'] != 0]

answered Mar 8, 2019 at 15:59

Ikbel

2,0731 gold badge19 silver badges34 bronze badges

2

And its inplace too!
– Max Kleiner
Commented Aug 10, 2020 at 19:42
2

@MaxKleiner inplace by virtue of reassigning the variable
– lukas
Commented Sep 7, 2020 at 9:29
4

This solution deletes rows with AT LEAST 1 zero. The original poster asked to delete rows with ALL zeros. This is why The Unfun Cat's answer is correct.
– Iterator516
Commented Apr 30, 2021 at 13:23

Add a comment |

The Unfun Cat · Accepted Answer · 2016-03-09 13:05:41Z

34

I look up this question about once a month and always have to dig out the best answer from the comments:

df.loc[(df!=0).any(1)]

Thanks Dan Allan!

answered Mar 9, 2016 at 13:05

community wiki

The Unfun Cat

2

No digging required. @8one6 has included this in his answer back in 2014 itself, the part that says: "And for those who like symmetry...".
– Rahul Murmuria
Commented Jun 19, 2017 at 14:30
What if you have mixed data types, some strings and a lot of number columns with zeros?
– Arthur D. Howland
Commented Feb 24, 2023 at 3:23

Add a comment |

Tonechas · Accepted Answer · 2019-06-25 23:07:03Z

30

Replace the zeros with nan and then drop the rows with all entries as nan. After that replace nan with zeros.

import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)

edited Jun 25, 2019 at 23:07

Tonechas

13.6k16 gold badges50 silver badges83 bronze badges

answered Jul 3, 2015 at 11:43

stackpopped

3093 silver badges3 bronze badges

12

This will fail if you have any pre-existing NaN-s in the data.
– OmerB
Commented Sep 4, 2017 at 13:45

Add a comment |

clocker · Accepted Answer · 2017-02-05 17:58:57Z

12

Couple of solutions I found to be helpful while looking this up, especially for larger data sets:

df[(df.sum(axis=1) != 0)]       # 30% faster 
df[df.values.sum(axis=1) != 0]  # 3X faster

Continuing with the example from @U2EF1:

In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})

In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop

In [92]: df[(df.sum(axis=1) != 0)]
Out[92]: 
   a  b
1  0  1
2  1  0
3  1  1

In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop

In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop

On a larger dataset:

In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))

In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop

In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop

In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop

answered Feb 5, 2017 at 17:58

clocker

1,3669 silver badges16 bronze badges

5

Do bad things happen if your row contains a -1 and a 1?
– Rhys Ulerich
Commented Mar 15, 2017 at 20:20
Of course, the sum wouldn't work if you had equal rows adding up to 0. Here's a quick workaround for that which is only slightly slower: df[~(df.values.prod(axis=1) == 0) | ~(df.values.sum(axis=1)==0)]
– clocker
Commented Mar 17, 2017 at 2:43
1

The prod() function doesn't solve anything. If you have any 0 in the row that will return 0. If you have to handle a row like this: [-1, -0.5, 0, 0.5, 1], neither of your solutions will work.
– Rahul Murmuria
Commented Jun 19, 2017 at 14:45
Here is a correct version that works 3x faster than the accepted answer: bdf[np.square(bdf.values).sum(axis=1) != 0]
– Rahul Murmuria
Commented Jun 19, 2017 at 17:59

Add a comment |

8one6 · Accepted Answer · 2014-03-26 01:53:11Z

You can use a quick lambda function to check if all the values in a given row are 0. Then you can use the result of applying that lambda as a way to choose only the rows that match or don't match that condition:

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(5,3), 
                  index=['one', 'two', 'three', 'four', 'five'],
                  columns=list('abc'))

df.loc[['one', 'three']] = 0

print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]

Yields:

              a         b         c
one    0.000000  0.000000  0.000000
two    2.240893  1.867558 -0.977278
three  0.000000  0.000000  0.000000
four   0.410599  0.144044  1.454274
five   0.761038  0.121675  0.443863

[5 rows x 3 columns]
             a         b         c
two   2.240893  1.867558 -0.977278
four  0.410599  0.144044  1.454274
five  0.761038  0.121675  0.443863

[3 rows x 3 columns]

Akavall · Accepted Answer · 2014-03-26 02:06:27Z

5

import pandas as pd

df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})

temp = df.abs().sum(axis=1) == 0      
df = df.drop(temp)

Result:

>>> df
   a  b
2  1 -1

answered Mar 26, 2014 at 2:06

Akavall

85.2k53 gold badges211 silver badges257 bronze badges

Did not work for me with a 1-column dataframe. Got ValueError: labels [True ... ] not contained in matrix
– The Unfun Cat
Commented Apr 24, 2015 at 12:25
1

instead of df = df.drop(temp) use df = df.drop(df[temp].index)
– Douglas Ferreira
Commented Jun 25, 2019 at 23:25

Add a comment |

Gideon Kogan · Accepted Answer · 2021-06-15 14:38:05Z

4

Following the example in the accepted answer, a more elegant solution:

df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
df = df[df.any(axis=1)]
print(df)

   a  b
1  0  1
2  1  0
3  1  1

edited Jun 15, 2021 at 14:38

answered Jun 15, 2021 at 14:13

Gideon Kogan

7434 silver badges19 bronze badges

Add a comment |

bmc · Accepted Answer · 2018-04-29 20:08:30Z

3

Another alternative:

# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.loc[all_zero_mask,:] --> mask your rows to only show the rows which contained a non-zero entry.
# df.shape to confirm a subset.

all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
df.loc[all_zero_mask,:].shape

answered Apr 29, 2018 at 20:08

bmc

8571 gold badge12 silver badges24 bronze badges

Add a comment |

pyeR_biz · Accepted Answer · 2020-10-06 05:24:42Z

2

this works for me new_df = df[df.loc[:]!=0].dropna()

edited Oct 6, 2020 at 5:24

pyeR_biz

1,04414 silver badges38 bronze badges

answered Oct 5, 2020 at 20:57

majdoul jihane

212 bronze badges

Add a comment |

Denisa · Accepted Answer · 2020-02-27 13:00:28Z

1

For me this code: df.loc[(df!=0).any(axis=0)] did not work. It returned the exact dataset.

Instead, I used df.loc[:, (df!=0).any(axis=0)] and dropped all the columns with 0 values in the dataset

The function .all() droped all the columns in which are any zero values in my dataset.

answered Feb 27, 2020 at 13:00

Denisa

216 bronze badges

Add a comment |

Kumar Prasanna · Accepted Answer · 2018-11-19 08:55:24Z

0

df = df [~( df [ ['kt'  'b'   'tt'  'mky' 'depth', ] ] == 0).all(axis=1) ]

Try this command its perfectly working.

answered Nov 19, 2018 at 8:55

Kumar Prasanna

467 bronze badges

Add a comment |

JALO - JusAnotherLivngOrganism · Accepted Answer · 2024-02-06 11:07:47Z

0

Accessing only the indices that are TRUE for the row sum > 0 should suffice:

ndf=df[df.sum(axis=1)>0]

answered Feb 6 at 11:07

JALO - JusAnotherLivngOrganism

1,2581 gold badge13 silver badges28 bronze badges

Add a comment |

L Tyrone · Accepted Answer · 2024-02-06 22:11:33Z

0

from io import StringIO

import pandas as pd

s = '''
P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0
'''
df = pd.read_csv(StringIO(s), sep=r'\s+', engine='python',index_col=0)
print(df)
print()
print(
    df.where(df != 0).dropna(how='all')
)

edited Feb 6 at 22:11

L Tyrone

4,77121 gold badges27 silver badges36 bronze badges

answered Feb 6 at 13:54

Justin Tang

1

Add a comment |

Yapi · Accepted Answer · 2019-03-19 10:45:34Z

-2

To drop all columns with values 0 in any row:

new_df = df[df.loc[:]!=0].dropna()

edited Mar 19, 2019 at 10:45

answered Mar 19, 2019 at 10:39

Yapi

3042 silver badges7 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Drop rows with all zeros in pandas data frame

16 Answers 16

Not the answer you're looking for? Browse other questions tagged
python
pandas
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

Not the answer you're looking for? Browse other questions tagged pythonpandas or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
or ask your own question.