My doubt about overfitting is almost general, but in this particular case is all about survival models. I am working in a case-cohort study, estimating the HR in a cohort where heart attack correspond to cases (56 individuals) and the rest are health controls (192).
We perform a Cox regression to estimate HR of different covariates, in which I am specially interested in potential molecular biomarkers and its diagnostic efficiency. The thing is based in our population and the descriptive statistics, we ruled out different covariates (related with cardiovascular disease), however we still have almost 9 or 10 variables that would be useful to include as predictor (including the microRNA). According to the distribution of this variables (age, sex, weight, total cholesterol, diabetes status, smoking status..) we can still rule out if it's advisable.
I come from here where is a similar discussion (with different study design) in which it is suggested by Prof. Harrell a new method, which I don't fully understand. Seems that rule of thumb of 10 cases for 1 predict variable is not advisable. So my doubt lies in the alternatives or methods to estimate when I am overfitting the model, in this case a Cox regression model (any extra information of other models is always welcome, but not intended in this thread)
# the model
Surv(timetoevent, heart_attack) ~ age+ sex + HDL-c + diabetes + smoking + batch + weight + total_cholesterol + biomarker_of_interest,data = datab, subcoh = ~ subdata, id = ~ids, cohort.size = 5404, method = "LinYing", robust = TRUE)