I prepared a csv file for LGBM machine learning and used the following code.

X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=333 )
lgbm_wrapper = LGBMClassifier(n_estimators=400)

evals = [(X_test, y_test)]
lgbm_wrapper.fit(X_train, y_train, early_stopping_rounds=100,
eval_metric="logloss", eval_set=evals, verbose=True)
preds = lgbm_wrapper.predict(X_test)
pred_proba = lgbm_wrapper.predict_proba(X_test)[:, 1]

But I face this kind of problem.

/usr/local/lib/python3.7/dist-packages/sklearn/utils/validation.py in
_assert_all_finite(X, allow_nan, msg_dtype)
104                     msg_err.format
105                     (type_err,
--> 106                      msg_dtype if msg_dtype is not None else
107             )
108     # for object dtype data, we only check for NaNs (GH-13254)
ValueError: Input contains NaN, infinity or a value too large for

To solve this problem, I checked the data type of data first.

Date             object
A                float64
B                 int64
C                 int64
D                float64
E                float64
F                float64
G                float64
H                 object
dtype: object

X.dropna() was also pre-treated to eliminate NaN-related values. However, a float63 related error still occurs. I need a little help. enter image description here

My data consists like this

  • Can you try rounding your float to 3 or 4 decimals, and revert if it doesn't work? Commented Jun 22, 2021 at 8:04
  • @shivam13juna I rounded to third decimal place but still doesn't work:(
    – Emma Lim
    Commented Jun 22, 2021 at 9:38
  • It has to be a data problem Emma, you've to make sure all data points are both finite and not nan. Try placing some manual checks like checking for values greater than 999. Try that, if it doesn't work, and if data isn't private, share data with me I'll see Commented Jun 22, 2021 at 13:00
  • @shivam13juna So can't we use numbers over 999? I have a 4-digit data type among the data.Thank you very much for your reply!
    – Emma Lim
    Commented Jun 23, 2021 at 7:01
  • Nah Nah, it's just for checking what values might be getting interpreted as NaN, try checking for numbers > 999999. Since you have 4-digit data, no row should have any value > 999999. If there is, then that's the possible error. Commented Jun 23, 2021 at 7:51

1 Answer 1


When using dropna, you do know that X.dropna() isn't in place right, I hope for dropping NA you did X = X.dropna(), for the indices in which you drop X drop corresponding indices in target too.

Not the answer you're looking for? Browse other questions tagged or ask your own question.