Skip to main content

All Questions

Tagged with
0 votes
0 answers
9 views

Is there a way to create a bootstrapped beta calibration function to use on new data?

I have created ML classification models that are now to be evaluated on a different population for external validation (n=5000, event rates between n=400 and n=1200 for different outcomes under study)....
mmo's user avatar
  • 1
1 vote
0 answers
9 views

Use a metric that is not available in the list of metric for xgboost

Working in R. I am following this post on stack overflow. I am train an xgboost model and I want to use another metric that is not in the list of metric we can whoose for the eval_metric parameter. I ...
Camillionnaire's user avatar
0 votes
0 answers
10 views

Fequency encoding in R while using a cross validated model: How to use step_lencode_mixed()

One way of addressing high cardinality in a column is the use of frequency encoding. However, if you use a cross validated analysis plan the you would need to re-encode the column at each step. It's ...
Englishman Bob's user avatar
0 votes
1 answer
15 views

Struggling with normalization/Standardisation for machine learning dataset

Sorry for what is probably a very obvious/rookie question. I'm currently doing a data science module for my degree and making very slow progress with the work. The case study i'm doing is around HR ...
Alex Ferry's user avatar
0 votes
0 answers
12 views

How can I combine/pool of the results of regression with neural network?

My study has ten imputed dependent variables (plausible values). After separately analyzing each dependent variable using a regression neural network (NN), I must combine/pool the results. I tried ...
minre's user avatar
  • 1
0 votes
0 answers
28 views

sklearn Random Forest classifier vs R’s Random Forest classifier

I’m trying to implement the R’s random forest classifier equivalent in python- ...
Mark W's user avatar
  • 1
0 votes
1 answer
71 views

ROC curve manual calculation vs. pROC package R

I want do recreate ROC curve manually on my dataset and compare it to roc function from pROC package in R. I'm using dataset on customer churn telco.csv from Kaggle....
Nikola's user avatar
  • 1
0 votes
1 answer
65 views

Should I choose an ARIMA model (2,1,1) with a higher AIC value or an ARIMA model (6,1,8) with a lower AIC value?

I am trying to fit an ARIMA model to time series data. When I fit the model using auto.arima function in R, ...
Mehmet Yildirim's user avatar
1 vote
1 answer
103 views

Packages for Density Estimation using K-Nearest Neighbor

I would like to have suggestions for packages that provide K-Nearest Neighbor density estimator, I've already searched the web (to not bother you guys with my question :) ), but most results were ...
Neyo Goldsmith's user avatar
0 votes
0 answers
69 views

R [Warning] No further splits with positive gain, best gain: -inf in lightgbm training

I read through some answers, it seems many people face this message before. The answer seems there is no further need to split the tree, so you need to adjust the super parameter to make new splitting....
cloudscomputes's user avatar
0 votes
0 answers
30 views

Algorithms from R-statistics package Caret R Package- LVQ algorithm, is there similar in Python

In the R-statistics package : Caret R Package, they have the LVQ algorithm that is used for the purpose of "Feature Selection". I have used this to do some data science in R-stats over 6 ...
Palu's user avatar
  • 103
0 votes
0 answers
12 views

Interpreting large discrepancies between Specificities & the # of Extraneous Variable Models selected by a variable selection algorithm

I am going to preface my question by saying that this problem of interpretation I have run into is in the context of me doing my part as a collaborator on a statistical learning paper for the first ...
Marlen's user avatar
  • 167
0 votes
0 answers
75 views

Prediction in multiclass classification

Context: I need to make an multiclass classification to predict what type of sentence(law) the case will have in the end. Data: I Have several columns to predict the case:client, cause of action, ...
TM01's user avatar
  • 1
0 votes
0 answers
32 views

How to solve this error in R Error in mls(Log_change, 1:11, 52/12) : Incomplete high frequency data?

...
J_Bake's user avatar
  • 1
1 vote
1 answer
36 views

Removing specific phrases from textual data (R)

I have the following reddit posts and I would like to clean the posts and remove from the data the specific phrase "Click to expand", while keeping all other words within a post the same. <...
maldini1990's user avatar

15 30 50 per page
1
2 3 4 5
25