I have a large dataset of which I would like to drop columns that contain null
values and return a new dataframe. How can I do that?
The following only drops a single column or rows containing null
.
df.where(col("dt_mvmt").isNull()) #doesnt work because I do not have all the columns names or for 1000's of columns
df.filter(df.dt_mvmt.isNotNull()) #same reason as above
df.na.drop() #drops rows that contain null, instead of columns that contain null
For example
a | b | c
1 | | 0
2 | 2 | 3
In the above case it will drop the whole column B
because one of its values is empty.