How to check if any value is NaN in a Pandas DataFrame

Question

How do I check whether a pandas DataFrame has NaN values?

I know about pd.isnan but it returns a DataFrame of booleans. I also found this post but it doesn't exactly answer my question either.

check out summary of the counts of missing data in pandas
– LinkBerest Left - SO is AI Now
Commented Apr 9, 2015 at 5:16 — LinkBerest Left - SO is AI Now, Commented Apr 9, 2015 at 5:16
Best answer : stackoverflow.com/questions/22257527/…
– Jaya Raghavendra
Commented Mar 3, 2023 at 23:15 — Jaya Raghavendra, Commented Mar 3, 2023 at 23:15

Nico Schlömer · Accepted Answer · 2020-08-31 13:30:30Z

863

jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

df.isnull().values.any()

import numpy as np
import pandas as pd
import perfplot


def setup(n):
    df = pd.DataFrame(np.random.randn(n))
    df[df > 0.9] = np.nan
    return df


def isnull_any(df):
    return df.isnull().any()


def isnull_values_sum(df):
    return df.isnull().values.sum() > 0


def isnull_sum(df):
    return df.isnull().sum() > 0


def isnull_values_any(df):
    return df.isnull().values.any()


perfplot.save(
    "out.png",
    setup=setup,
    kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],
    n_range=[2 ** k for k in range(25)],
)

df.isnull().sum().sum() is a bit slower, but of course, has additional information -- the number of NaNs.

edited Aug 31, 2020 at 13:30

Nico Schlömer

57.5k33 gold badges210 silver badges273 bronze badges

answered Apr 9, 2015 at 5:39

S Anand

11.8k2 gold badges29 silver badges23 bronze badges

1

Thank you for the time benchmarks. It's surprising that pandas doesn't have a built in function for this. It's true from @JGreenwell's post that df.describe() can do this, but no direct function.
– hlin117
Commented Apr 9, 2015 at 6:37
2

I just timed df.describe() (without finding NaNs). With a 1000 x 1000 array, a single call takes 1.15 seconds.
– hlin117
Commented Apr 9, 2015 at 6:43
3

:1, Also, df.isnull().values.sum() is a bit faster than df.isnull().values.flatten().sum()
– Zero
Commented Apr 12, 2015 at 21:02
9

You didn't try df.isnull().values.any(), for me it is faster than the others.
– CK1
Commented Jul 15, 2015 at 15:28
1

np.isnan(df.values).any() works a bit faster, but it doesn't work for object dtype
– Eugene Pakhomov
Commented Jan 22, 2017 at 19:09

| Show 5 more comments

Manu CJ · Accepted Answer · 2018-01-29 12:26:22Z

You have a couple of options.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Now the data frame looks something like this:

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810

Option 1: df.isnull().any().any() - This returns a boolean value

You know of the isnull() which would return a dataframe like this:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

If you make it df.isnull().any(), you can find just the columns that have NaN values:

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

One more .any() will tell you if any of the above are True

> df.isnull().any().any()
True

Option 2: df.isnull().sum().sum() - This returns an integer of the total number of NaN values:

This operates the same way as the .any().any() does, by first giving a summation of the number of NaN values in a column, then the summation of those values:

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

Finally, to get the total number of NaN values in the DataFrame:

df.isnull().sum().sum()
5

Why not using .any(axis=None) instead of .any().any()?
– Georgy
Commented Jan 13, 2020 at 18:59 — Georgy, Commented Jan 13, 2020 at 18:59

Håken Lid · Accepted Answer · 2017-11-19 13:23:17Z

116

To find out which rows have NaNs in a specific column:

nan_rows = df[df['name column'].isnull()]

edited Nov 19, 2017 at 13:23

Håken Lid

22.9k9 gold badges54 silver badges70 bronze badges

answered Nov 19, 2017 at 13:13

Ihor Ivasiuk

1,2651 gold badge8 silver badges3 bronze badges

22

To find out which rows do not have NaNs in a specific column: non_nan_rows = df[df['name column'].notnull()].
– Elmex80s
Commented Nov 27, 2017 at 10:00
that's the one i needed. Thanks!
– kpierce8
Commented Jan 5 at 19:42

Add a comment |

hobs · Accepted Answer · 2020-09-26 20:33:55Z

67

If you need to know how many rows there are with "one or more NaNs":

df.isnull().T.any().T.sum()

Or if you need to pull out these rows and examine them:

nan_rows = df[df.isnull().T.any()]

edited Sep 26, 2020 at 20:33

answered May 25, 2016 at 16:17

hobs

19k10 gold badges88 silver badges108 bronze badges

what is T here ?
– WestCoastProjects
Commented Sep 23, 2022 at 9:27
alias for .transpose()
– hobs
Commented Sep 30, 2022 at 23:38

Add a comment |

jwilner · Accepted Answer · 2015-04-09 05:16:56Z

60

df.isnull().any().any() should do it.

answered Apr 9, 2015 at 5:16

jwilner

6,5087 gold badges37 silver badges47 bronze badges

Add a comment |

cs95 · Accepted Answer · 2019-05-22 06:47:57Z

Super Simple Syntax: `df.isna().any(axis=None)`

Starting from v0.23.2, you can use DataFrame.isna + DataFrame.any(axis=None) where axis=None specifies logical reduction over the entire DataFrame.

# Setup
df = pd.DataFrame({'A': [1, 2, np.nan], 'B' : [np.nan, 4, 5]})
df
     A    B
0  1.0  NaN
1  2.0  4.0
2  NaN  5.0

df.isna()

       A      B
0  False   True
1  False  False
2   True  False

df.isna().any(axis=None)
# True

Useful Alternatives

numpy.isnan
Another performant option if you're running older versions of pandas.

np.isnan(df.values)

array([[False,  True],
       [False, False],
       [ True, False]])

np.isnan(df.values).any()
# True

Alternatively, check the sum:

np.isnan(df.values).sum()
# 2

np.isnan(df.values).sum() > 0
# True

Series.hasnans
You can also iteratively call Series.hasnans. For example, to check if a single column has NaNs,

df['A'].hasnans
# True

And to check if any column has NaNs, you can use a comprehension with any (which is a short-circuiting operation).

any(df[c].hasnans for c in df)
# True

This is actually very fast.

This might not be the fastest option but it is the most readable one in 2022 :) — Joe, Commented Oct 18, 2022 at 9:18

Ankit · Accepted Answer · 2017-08-23 01:48:20Z

24

Adding to Hobs brilliant answer, I am very new to Python and Pandas so please point out if I am wrong.

To find out which rows have NaNs:

nan_rows = df[df.isnull().any(1)]

would perform the same operation without the need for transposing by specifying the axis of any() as 1 to check if 'True' is present in rows.

edited Aug 23, 2017 at 1:48

user6655984

answered Aug 23, 2017 at 1:22

Ankit

3412 silver badges4 bronze badges

This gets rid of two transposes! Love your concise any(axis=1) simplification.
– hobs
Commented Sep 9, 2018 at 22:22

Add a comment |

Naveen Reddy Marthala · Accepted Answer · 2021-01-14 15:52:50Z

21

let df be the name of the Pandas DataFrame and any value that is numpy.nan is a null value.

If you want to see which columns has nulls and which do not(just True and False)
```
df.isnull().any()
```
If you want to see only the columns that has nulls
```
df.loc[:, df.isnull().any()].columns
```
If you want to see the count of nulls in every column
```
df.isna().sum()
```
If you want to see the percentage of nulls in every column
```
df.isna().sum()/(len(df))*100
```
If you want to see the percentage of nulls in columns only with nulls:

df.loc[:,list(df.loc[:,df.isnull().any()].columns)].isnull().sum()/(len(df))*100

EDIT 1:

If you want to see where your data is missing visually:

import missingno
missingdata_df = df.columns[df.isnull().any()].tolist()
missingno.matrix(df[missingdata_df])

edited Jan 14, 2021 at 15:52

answered Jul 22, 2019 at 7:29

Naveen Reddy Marthala

3,0045 gold badges43 silver badges78 bronze badges

If you want to see the count of nulls in every column... That seems insane, why not just do df.isna().sum() ?
– AMC
Commented Feb 16, 2020 at 4:09

Add a comment |

chmodsss · Accepted Answer · 2017-10-18 12:28:20Z

11

Since none have mentioned, there is just another variable called hasnans.

df[i].hasnans will output to True if one or more of the values in the pandas Series is NaN, False if not. Note that its not a function.

pandas version '0.19.2' and '0.20.2'

edited Oct 18, 2017 at 12:28

answered May 5, 2017 at 14:17

chmodsss

7219 silver badges18 bronze badges

6

This answer is incorrect. Pandas Series have this attribute but DataFrames do not. If df = DataFrame([1,None], columns=['foo']), then df.hasnans will throw an AttributeError, but df.foo.hasnans will return True.
– Nathan Thompson
Commented Oct 11, 2017 at 22:27

Add a comment |

Marshall Farrier · Accepted Answer · 2016-06-16 05:06:18Z

8

Since pandas has to find this out for DataFrame.dropna(), I took a look to see how they implement it and discovered that they made use of DataFrame.count(), which counts all non-null values in the DataFrame. Cf. pandas source code. I haven't benchmarked this technique, but I figure the authors of the library are likely to have made a wise choice for how to do it.

answered Jun 16, 2016 at 5:06

Marshall Farrier

9672 gold badges11 silver badges23 bronze badges

Add a comment |

Peter Thomas · Accepted Answer · 2019-05-08 09:29:17Z

8

I've been using the following and type casting it to a string and checking for the nan value

   (str(df.at[index, 'column']) == 'nan')

This allows me to check specific value in a series and not just return if this is contained somewhere within the series.

answered May 8, 2019 at 9:29

Peter Thomas

811 silver badge2 bronze badges

1

Is there any advantage to using this over pandas.isna() ?
– AMC
Commented Feb 16, 2020 at 4:10
This allows checking a single field.
– Álvaro
Commented Jul 8, 2021 at 16:50

Add a comment |

Suraj Rao · Accepted Answer · 2021-10-06 05:53:27Z

8

try the following

df.isnull().sum()

or

df.isna().values.any()

edited Oct 6, 2021 at 5:53

Suraj Rao

29.6k11 gold badges95 silver badges104 bronze badges

answered Oct 6, 2021 at 5:50

Mohamed Othman

991 silver badge3 bronze badges

Add a comment |

marc_s · Accepted Answer · 2024-06-01 14:05:47Z

7

df.isnull().sum()

This will return the count of all NaN values present in the respective columns of the DataFrame.

edited Jun 1 at 14:05

marc_s

748k180 gold badges1.4k silver badges1.5k bronze badges

answered Jul 7, 2019 at 18:29

Adarsh singh

1371 silver badge11 bronze badges

No, that will give you a Series which maps column names to their respective number of NA values.
– AMC
Commented Feb 16, 2020 at 4:11
Corrected, my fault :p
– Adarsh singh
Commented Feb 21, 2020 at 5:39

Add a comment |

frankchen0130 · Accepted Answer · 2017-11-02 03:06:02Z

5

Just using math.isnan(x), Return True if x is a NaN (not a number), and False otherwise.

answered Nov 2, 2017 at 3:06

frankchen0130

5596 silver badges7 bronze badges

4

I don't think math.isnan(x) is going to work when x is a DataFrame. You get a TypeError instead.
– hlin117
Commented Nov 4, 2017 at 19:56
Why would you use this over any of the alternatives?
– AMC
Commented Feb 16, 2020 at 4:05

Add a comment |

Jagannath Banerjee · Accepted Answer · 2018-08-27 16:11:48Z

Here is another interesting way of finding null and replacing with a calculated value

    #Creating the DataFrame

    testdf = pd.DataFrame({'Tenure':[1,2,3,4,5],'Monthly':[10,20,30,40,50],'Yearly':[10,40,np.nan,np.nan,250]})
    >>> testdf2
       Monthly  Tenure  Yearly
    0       10       1    10.0
    1       20       2    40.0
    2       30       3     NaN
    3       40       4     NaN
    4       50       5   250.0

    #Identifying the rows with empty columns
    nan_rows = testdf2[testdf2['Yearly'].isnull()]
    >>> nan_rows
       Monthly  Tenure  Yearly
    2       30       3     NaN
    3       40       4     NaN

    #Getting the rows# into a list
    >>> index = list(nan_rows.index)
    >>> index
    [2, 3]

    # Replacing null values with calculated value
    >>> for i in index:
        testdf2['Yearly'][i] = testdf2['Monthly'][i] * testdf2['Tenure'][i]
    >>> testdf2
       Monthly  Tenure  Yearly
    0       10       1    10.0
    1       20       2    40.0
    2       30       3    90.0
    3       40       4   160.0
    4       50       5   250.0

Aditya · Accepted Answer · 2020-05-09 02:53:08Z

4

We can see the null values present in the dataset by generating heatmap using seaborn moduleheatmap

import pandas as pd
import seaborn as sns
dataset=pd.read_csv('train.csv')
sns.heatmap(dataset.isnull(),cbar=False)

answered May 9, 2020 at 2:53

Aditya

4083 silver badges11 bronze badges

Add a comment |

Daniel Malachov · Accepted Answer · 2022-01-18 20:55:45Z

I recommend to use values attribute as evaluation on array is much faster.

arr = np.random.randn(100, 100)
arr[40, 40] = np.nan
df = pd.DataFrame(arr)

%timeit np.isnan(df.values).any()  # 7.56 µs
%timeit np.isnan(df).any()         # 627 µs
%timeit df.isna().any(axis=None)   # 572 µs

Result:

7.56 µs ± 447 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
627 µs ± 40.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
572 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Note: You need to run %timeit in Jupyter notebook to work

prosti · Accepted Answer · 2019-06-03 11:00:51Z

3

The best would be to use:

df.isna().any().any()

Here is why. So isna() is used to define isnull(), but both of these are identical of course.

This is even faster than the accepted answer and covers all 2D panda arrays.

answered Jun 3, 2019 at 11:00

prosti

45.3k18 gold badges192 silver badges160 bronze badges

Add a comment |

Pobaranchuk · Accepted Answer · 2021-03-13 09:10:18Z

3

To do this we can use the statement df.isna().any() . This will check all of our columns and return True if there are any missing values or NaNs, or False if there are no missing values.

answered Mar 13, 2021 at 9:10

Pobaranchuk

86710 silver badges13 bronze badges

Add a comment |

Brndn · Accepted Answer · 2022-09-24 20:20:54Z

3

This will only include columns with at least 1 null/na value.

 df.isnull().sum()[df.isnull().sum()>0]

answered Sep 24, 2022 at 20:20

Brndn

7861 gold badge8 silver badges22 bronze badges

Add a comment |

Jan Sila · Accepted Answer · 2018-06-26 11:30:09Z

2

Or you can use .info() on the DF such as :

df.info(null_counts=True) which returns the number of non_null rows in a columns such as:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3276314 entries, 0 to 3276313
Data columns (total 10 columns):
n_matches                          3276314 non-null int64
avg_pic_distance                   3276314 non-null float64

answered Jun 26, 2018 at 11:30

Jan Sila

1,5733 gold badges18 silver badges37 bronze badges

Add a comment |

Ikbel · Accepted Answer · 2019-08-09 13:24:17Z

2

import missingno as msno
msno.matrix(df)  # just to visualize. no missing value.

answered Aug 9, 2019 at 13:24

Ikbel

2,0731 gold badge19 silver badges34 bronze badges

Add a comment |

U13-Forward · Accepted Answer · 2021-10-18 15:06:15Z

2

Another way is to dropna and check if the lengths are equivalent:

>>> len(df.dropna()) != len(df)
True
>>>

answered Oct 18, 2021 at 15:06

U13-Forward

71.2k14 gold badges96 silver badges117 bronze badges

Add a comment |

Alex Dlikman · Accepted Answer · 2018-12-24 15:29:56Z

1

df.apply(axis=0, func=lambda x : any(pd.isnull(x)))

Will check for each column if it contains Nan or not.

answered Dec 24, 2018 at 15:29

Alex Dlikman

515 bronze badges

Why use this over any of the builtin solutions?
– AMC
Commented Feb 16, 2020 at 4:12

Add a comment |

cottontail · Accepted Answer · 2023-12-07 23:15:30Z

Given the following dataframe:

     A  B    C
0  1.0  a  NaN
1  2.0  b  4.0
2  NaN  c  5.0

Check if there are any NaN values:

df.isna().any(axis=None)              # True
df.isna().to_numpy().any()            # True
df.ne(df).any(axis=None)              # True
(df!=df).any(axis=None)               # True
df.eval("A!=A or B!=B or C!=C").any() # True

Column labels with NaN values:

df.isna().any().pipe(lambda x: x.index[x])        

Index(['A', 'C'], dtype='object')

Index labels with NaN values:

df.isna().any(axis=1).pipe(lambda x: x.index[x])

Index([0, 2], dtype='int64')

Columns with NaN values:

df.loc[:, df.isna().any()]

     A    C
0  1.0  NaN
1  2.0  4.0
2  NaN  5.0

Rows with NaN values:

df[df.isna().any(axis=1)]

     A  B    C
0  1.0  a  NaN
2  NaN  c  5.0

eyllanesc · Accepted Answer · 2020-02-04 22:06:35Z

0

You could not only check if any 'NaN' exist but also get the percentage of 'NaN's in each column using the following,

df = pd.DataFrame({'col1':[1,2,3,4,5],'col2':[6,np.nan,8,9,10]})  
df  

   col1 col2  
0   1   6.0  
1   2   NaN  
2   3   8.0  
3   4   9.0  
4   5   10.0  


df.isnull().sum()/len(df)  
col1    0.0  
col2    0.2  
dtype: float64

edited Feb 4, 2020 at 22:06

eyllanesc

241k19 gold badges191 silver badges269 bronze badges

answered Feb 4, 2020 at 21:50

Nizam

3801 gold badge6 silver badges12 bronze badges

Add a comment |

FAISAL BARGI · Accepted Answer · 2022-01-22 09:34:00Z

0

Bar representation for missing values

import missingno
missingno.bar(df)# will give you exact no of values and values missing

answered Jan 22, 2022 at 9:34

FAISAL BARGI

306 bronze badges

Add a comment |

Jaya Raghavendra · Accepted Answer · 2023-03-03 23:17:16Z

0

This is code makes your life easy

import sidetable

df.stb.missing()

Check this out : https://github.com/chris1610/sidetable

answered Mar 3, 2023 at 23:17

Jaya Raghavendra

1,4972 gold badges9 silver badges10 bronze badges

Add a comment |

Harivignesh · Accepted Answer · 2024-06-25 08:02:51Z

0

You can't access NaN values in pandas using any comparision operators. np.nan and "None" can not be compared with the nan value present in the data. The reason is strange because when you see the type of the nan in data it is np.float64. nan in data can be accessed by using isna() function.

count=0
for i in data.columns:
    for j in data[i]:
        if isna(j):
            count+=1
print(count)

Hope it Helps!.

answered Jun 25 at 8:02

Harivignesh

11

Add a comment |

unique_beast · Accepted Answer · 2016-03-24 02:44:40Z

-1

Depending on the type of data you're dealing with, you could also just get the value counts of each column while performing your EDA by setting dropna to False.

for col in df:
   print df[col].value_counts(dropna=False)

Works well for categorical variables, not so much when you have many unique values.

answered Mar 24, 2016 at 2:44

unique_beast

1,4503 gold badges14 silver badges23 bronze badges

I think this is inefficient. Built-in functions of pandas are more neat/terse. Avoids cluttering of the ipython notebook.
– Koo
Commented Apr 10, 2019 at 17:15

Add a comment |

Collectives™ on Stack Overflow

How to check if any value is NaN in a Pandas DataFrame

30 Answers 30

Super Simple Syntax: `df.isna().any(axis=None)`

Useful Alternatives

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
nan
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

30 Answers 30

Super Simple Syntax: df.isna().any(axis=None)

Useful Alternatives

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframenan or ask your own question.

Linked

Related

Super Simple Syntax: `df.isna().any(axis=None)`

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
nan
or ask your own question.