188

I can use pandas dropna() functionality to remove rows with some or all columns set as NA's. Is there an equivalent function for dropping rows with all columns having value 0?

P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0

In this example, we would like to drop the first 4 rows from the data frame.

thanks!

1
  • Just to clarify, this is two questions. One, to drop columns with all values as 0. But also, for a function equivalent to dropna() which would drop columns with any value as 0.
    – alchemy
    Commented Apr 22, 2020 at 17:54

16 Answers 16

226

One-liner. No transpose needed:

df.loc[~(df==0).all(axis=1)]

And for those who like symmetry, this also works...

df.loc[(df!=0).any(axis=1)]
5
  • 4
    For brevity (and, in my opinion, clarity of purpose) combine this and Akavall's comment: df.loc[(df != 0).any(1)]. Teamwork!
    – Dan Allan
    Commented Mar 26, 2014 at 3:00
  • 1
    +1, 30% faster that transpose -- 491 to 614 microsec, and I like the axis=1 for being explicit; more pythonic in my opinion
    – gt6989b
    Commented Jun 27, 2016 at 21:41
  • 2
    Some mention should be made of difference between using .all and .any since the original question mentioned equivalence of dropna. If you want to drop all rows with any column containing a zero, you have to reverse the .all and .any in above answer. Took me awhile to realize this as I was looking for that functionality.
    – Zak Keirn
    Commented Mar 6, 2018 at 18:21
  • 1
    This does not work for me, but returns me the exact same df
    – Robvh
    Commented Jul 17, 2019 at 12:31
  • Is there an 'inplace' version of this? I see that to drop rows in a df as the OP requested, this would need to be df = df.loc[(df!=0).all(axis=1)] and df = df.loc[(df!=0).any(axis=1)] to drop rows with any zeros as would be the actual equivalent to dropna().
    – alchemy
    Commented Apr 22, 2020 at 17:51
147

It turns out this can be nicely expressed in a vectorized fashion:

> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
   a  b
1  0  1
2  1  0
3  1  1
11
  • 7
    Nice, but I think you can avoid negation with df = df[(df.T != 0).any()]
    – Akavall
    Commented Mar 26, 2014 at 2:23
  • 1
    @Akavall Much better!
    – U2EF1
    Commented Mar 26, 2014 at 3:04
  • 3
    Just a note: OP wanted to drop rows with all columns having value 0, but one can infer all method.
    – paulochf
    Commented Apr 25, 2016 at 20:02
  • 1
    All of these answers explain how can we drop rows with all zeros, However, I wanted to drop rows, with 0 in the first column. With the help of all discussion and answers in this post, I did this by doing df.loc[df.iloc[:, 0] != 0]. Just wanted to share because this problem is related to this question!!
    – hemanta
    Commented Feb 14, 2019 at 4:47
  • 4
    The transpose is not necessary, any() can take an axis as a parameter. So this works: df = df[df.any(axis=1)]
    – Rahul Jha
    Commented Jul 17, 2019 at 17:22
58

I think this solution is the shortest :

df= df[df['ColName'] != 0]
3
  • 2
    And its inplace too! Commented Aug 10, 2020 at 19:42
  • 2
    @MaxKleiner inplace by virtue of reassigning the variable
    – lukas
    Commented Sep 7, 2020 at 9:29
  • 4
    This solution deletes rows with AT LEAST 1 zero. The original poster asked to delete rows with ALL zeros. This is why The Unfun Cat's answer is correct. Commented Apr 30, 2021 at 13:23
34

I look up this question about once a month and always have to dig out the best answer from the comments:

df.loc[(df!=0).any(1)]

Thanks Dan Allan!

2
  • 2
    No digging required. @8one6 has included this in his answer back in 2014 itself, the part that says: "And for those who like symmetry...". Commented Jun 19, 2017 at 14:30
  • What if you have mixed data types, some strings and a lot of number columns with zeros? Commented Feb 24, 2023 at 3:23
30

Replace the zeros with nan and then drop the rows with all entries as nan. After that replace nan with zeros.

import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)
1
  • 12
    This will fail if you have any pre-existing NaN-s in the data.
    – OmerB
    Commented Sep 4, 2017 at 13:45
12

Couple of solutions I found to be helpful while looking this up, especially for larger data sets:

df[(df.sum(axis=1) != 0)]       # 30% faster 
df[df.values.sum(axis=1) != 0]  # 3X faster 

Continuing with the example from @U2EF1:

In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})

In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop

In [92]: df[(df.sum(axis=1) != 0)]
Out[92]: 
   a  b
1  0  1
2  1  0
3  1  1

In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop

In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop

On a larger dataset:

In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))

In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop

In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop

In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop
4
  • 5
    Do bad things happen if your row contains a -1 and a 1? Commented Mar 15, 2017 at 20:20
  • Of course, the sum wouldn't work if you had equal rows adding up to 0. Here's a quick workaround for that which is only slightly slower: df[~(df.values.prod(axis=1) == 0) | ~(df.values.sum(axis=1)==0)]
    – clocker
    Commented Mar 17, 2017 at 2:43
  • 1
    The prod() function doesn't solve anything. If you have any 0 in the row that will return 0. If you have to handle a row like this: [-1, -0.5, 0, 0.5, 1], neither of your solutions will work. Commented Jun 19, 2017 at 14:45
  • Here is a correct version that works 3x faster than the accepted answer: bdf[np.square(bdf.values).sum(axis=1) != 0] Commented Jun 19, 2017 at 17:59
7

You can use a quick lambda function to check if all the values in a given row are 0. Then you can use the result of applying that lambda as a way to choose only the rows that match or don't match that condition:

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(5,3), 
                  index=['one', 'two', 'three', 'four', 'five'],
                  columns=list('abc'))

df.loc[['one', 'three']] = 0

print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]

Yields:

              a         b         c
one    0.000000  0.000000  0.000000
two    2.240893  1.867558 -0.977278
three  0.000000  0.000000  0.000000
four   0.410599  0.144044  1.454274
five   0.761038  0.121675  0.443863

[5 rows x 3 columns]
             a         b         c
two   2.240893  1.867558 -0.977278
four  0.410599  0.144044  1.454274
five  0.761038  0.121675  0.443863

[3 rows x 3 columns]
5
import pandas as pd

df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})

temp = df.abs().sum(axis=1) == 0      
df = df.drop(temp)

Result:

>>> df
   a  b
2  1 -1
2
  • Did not work for me with a 1-column dataframe. Got ValueError: labels [True ... ] not contained in matrix Commented Apr 24, 2015 at 12:25
  • 1
    instead of df = df.drop(temp) use df = df.drop(df[temp].index) Commented Jun 25, 2019 at 23:25
4

Following the example in the accepted answer, a more elegant solution:

df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
df = df[df.any(axis=1)]
print(df)

   a  b
1  0  1
2  1  0
3  1  1
3

Another alternative:

# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.loc[all_zero_mask,:] --> mask your rows to only show the rows which contained a non-zero entry.
# df.shape to confirm a subset.

all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
df.loc[all_zero_mask,:].shape
2

this works for me new_df = df[df.loc[:]!=0].dropna()

0
1

For me this code: df.loc[(df!=0).any(axis=0)] did not work. It returned the exact dataset.

Instead, I used df.loc[:, (df!=0).any(axis=0)] and dropped all the columns with 0 values in the dataset

The function .all() droped all the columns in which are any zero values in my dataset.

0
df = df [~( df [ ['kt'  'b'   'tt'  'mky' 'depth', ] ] == 0).all(axis=1) ]

Try this command its perfectly working.

0
0

Accessing only the indices that are TRUE for the row sum > 0 should suffice:

ndf=df[df.sum(axis=1)>0]
0
from io import StringIO

import pandas as pd

s = '''
P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0
'''
df = pd.read_csv(StringIO(s), sep=r'\s+', engine='python',index_col=0)
print(df)
print()
print(
    df.where(df != 0).dropna(how='all')
)
-2

To drop all columns with values 0 in any row:

new_df = df[df.loc[:]!=0].dropna()

Not the answer you're looking for? Browse other questions tagged or ask your own question.