First off, as Richard Hardy comments, information criteria do not assume we have the true model. Quite to the contrary. For instance, AIC estimates the Kullback-Leibler distance between the proposed model and the true data generating process (up to an offset), and picking the model with minimal AIC amounts to choosing the one with the smallest distance to the true DGP. See Burnham & Anderson (2002, Model selection and multi-model inference: a practical information-theoretic approach) or Burnham & Anderson (2004, Sociological Methods & Research) for an accessible treatment. They also go into the justification for BIC.
Information criteria break down with overparameterized models, but that's not really a problem of the ICs. Instead, it's that every overparameterized model that is not regularized breaks down, and that "normal" ICs don't work with regularized models. (I believe there are IC variants that apply to regularized models, but am not an expert in this.)
ICs are used in forecasting model selection because of the above argument about distances to true DGPs. A related argument is that the AIC asymptotically estimates a monotone function of the prediction error (section 4.3.1 in Lütkepohl, 2005, New Introduction to Multiple Time Series Analysis, who also goes into other model selection criteria). Also, ICs are not the only tool used: some people prefer using holdout sets, but that means you need more data.