Isn't it an example of the forking paths problem?
Yes.
There are a number of posts on site that address this issue, (and related issues of various aspects of model selection such as deciding whether to assume constant variance or not).
The degree to which it's a problem can vary quite a bit; in some kinds of circumstance it matters only a little (e.g. perhaps it only pushes up significance levels noticeably on one of the tests and perhaps not by a lot), other times it matters more.
Of course it can be hard to gauge quite how much in practice because impact on significance level and power (and indeed rates of other kinds of error) are phenomena at the level of the population-process (across all possible samples), while you have only a single data set -- the actual situation you're in with real data are naturally unknown (or you wouldn't have needed a hypothesis test in the first place). You can make some assumptions and investigate but it's easy to fall into the trap of only investigating a few very tame assumptions and then asserting a general stamp of approval.
It's also not simply that you're choosing between two different competing tests -- typically you're choosing between two distinct hypotheses, and in that sense is a form of Testing hypotheses suggested by the data, albeit not the worst possible example of it.
In that sense you can be choosing not just between two test but between two potentially very different conclusions (which may be in opposite directions).
I recently discussed these two related issues here. While I didn't specifically mention Gelman and Loken's forking paths in it, it is a term I've raised in discussing it at other times (albeit I'm not certain whether I've done that in an answer here or not).
As a general principle, one should be selecting hypotheses at the earliest stages of the study - relatively early in the planning. After all the question about population parameters that you wanted to answer shouldn't be a mystery to you.
Many people pursue a learned policy of making their hypotheses deliberately vague. This seems deliberately designed to enable the poor practice of choosing the hypothesis after seeing the data.
You should then be selecting models and tests before seeing data (ideally before collecting it so even inadvertent data leakage does not occur).
If you're not in a position to choose an explicit distributional model, you may be better to avoid doing so. This is in many cases a perfectly defensible option though if sample sizes are very small I would warn against using nonparametric tests at typical significance levels*.
I will say as an aside that the choice between parametric models (NB parametric does not mean 'normal') and non-parametric models does not necessarily dictate the form of the hypothesis. If I write a hypothesis about comparing two population means I am not locked to a t-test nor am I locked to any of a large collection of other parametric tests of means. It is possible to test means (e.g. hypotheses of equality vs inequality) under weaker assumptions, ones that don't require a parametric model. In short, you can choose nonparametric tests of means, if that's what you wish. However it is still very useful to decide a meaningful form of test statistic (not always based on a difference) for that comparison, and where possible to try to at least get an approximately pivotal quantity.
Indeed even when you have an explicit parametric model and perhaps have a nice powerful test when that model holds, you aren't necessarily restricted to perform that parametric test with that statistic. Often you can retain that power (or at least nearly all of it except perhaps in very small samples) in the case where the model was close to right and still get protection against the significance level being very far from the desired one.
Many practitioners seem to operate as if their perfectly common variables had never existed before this particular instance of collecting them and I have - many more times than I'd wish to count - been told that they literally know nothing about them whatever, when in fact quite a lot may be known about their variables -- enough to formulate perfectly reasonable models. Indeed very often reasonable models may be selected simply by knowing what you are measuring and what the support of the variables are ("what values they could possibly take").
The process of getting to at least a roughly suitable model or at least ruling out a few obvious non-starters is not especially mysterious.
But for a worker within a particular area, they have much more to draw on than the kind of variable and its support; previous studies, expert knowledge, theory, and so forth should provide a much richer context for model selection.
Even when one makes a new variable from scratch, very often it's either very like a variable that already exists. Failing that there's pilot studies.
Naturally, once in a while you might feel that there are special circumstances that will impact the situation (even under the null), and that nothing but examining some data will do. If there is no pilot stage, you should then plan from the start to split data into a part or parts for such things as model selection and any other data dependent choices that might be made, and sequestering the remainder for the actual test.
In some circumstances there may be little that can be done - a model is wrong and no more data will be forthcoming but some action is still needed (particularly in a business application for example). One must admit their type I error rate is not what it was intended to be and still do something.
I also note that there are some methods for inference after model selection (more typically - but not always - relating to variable selection); e.g. a number of papers on this topic can be found on arxiv. It's a topic I should pursue more but have only read a little on. If you intend to use these they should be in the plan at the start, naturally; choosing them post hoc, after some other approach was already tried won't necessarily lead to the process working as advertized.
---
* I have many times seen people use tests that have no chance whatever of rejecting the null, in complete ignorance of the fact that their lowest available alpha is above the nominal one they're blithely comparing their p-values to (and so will never find a sample that leads to rejection of the null). Quite recently it happened twice in one week. A common example is using the Mann-Whitney Wilcoxon with $n_1=n_2=3$, a two-sided alternative and a rejection rule of "reject $H_0$ if $p<0.05$". The lowest possible p-value in that case is $0.10$.
Needless to say, it's even worse if there's correction for multiple testing.
This issue of limited available significance levels occurs with permutation tests more generally, but some particular test statistics can make it worse than others. On the other hand I'd also avoid bootstrap tests with very small samples, albeit for different reasons. At the same time, since power is going to be low at all but quite large effect sizes, if you can reasonably justify a parametric approach, it may be worth pursuing in this situation.