In classical bootstrap, by taking samples with replacement from your data, you mimic sampling your data from the population. By repeating this process $K$ times you simulate the process of drawing your sample multiple times, and so, it lets you evaluate the possible variability estimate of your statistic $\phi$ (a function) over different samples from the same population $\hat\phi_k = \phi(X_k)$. So, we are simulating the "sampling distribution" of the statistic (variability due to the sampling process).
In Bayesian bootstrap, as described by Rubin (1981), you are estimating the distribution of your data $X = \{x_1,x_2,\dots,x_N\}$ and the posterior distribution of the estimates of the statistic $\phi(X)$. It is a non-parametric model, where we assume a categorical distribution over your datapoints (the likelihood)
$$
x_i|\boldsymbol{\pi} \sim \mathcal{Cat}(\boldsymbol{\pi})
$$
and for the unknown probabilities $\boldsymbol{\pi} = (\pi_1, \pi_2, \dots, \pi_N)$, we assume a uniform Dirichlet prior
$$
\boldsymbol{\pi} \sim \mathcal{Dir}(\boldsymbol{\alpha})
$$
paramterized by $\boldsymbol{\alpha} = (\alpha_1, \alpha_2, \dots, \alpha_N)$ where $\alpha_1 = \alpha_2 = \dots = \alpha_N = 1$. By plugging it into Bayes theorem, we are able to estimate the posterior distribution over the probabilities
$$
p(\boldsymbol{\pi}|X) \propto p(X|\boldsymbol{\pi}) \, p(\boldsymbol{\pi})
$$
Knowing the posterior distribution of the probabilities, leads us to knowing the posterior predictive distribution (the distribution of the data predicted, as by the model),
$$
p(\tilde x|X) = \int_\boldsymbol{\pi} p(\tilde x|X,\boldsymbol{\pi}) \, p(\boldsymbol{\pi}|X) \, d\boldsymbol{\pi}
$$
Next, we can easily estimate the distribution of the test statistic estimated on the data, by sampling from the posterior predictive distribution and plugging-in the statistic to the posterior samples $\phi(\tilde x)$. As you can see, we are not estimating the distribution of $\phi$ directly, but rather we are evaluating the statistic over the samples from the posterior distribution. This is what is meant by "simulating the posterior distribution of the parameter $\phi$". This distribution accounts both for the variability of the parameters $\boldsymbol{\pi}$ and the data $X$.
Answering your first question, it is a posterior distribution because we are operating in Bayesian setting. We have prior and likelihood, by combining them we estimate the posterior distribution. We are estimating the posterior distribution over the probabilities $\boldsymbol{\pi}$. The difference is that in frequentist setting you wouldn't be able to estimate the distribution of the parameter, you can only evaluate the statistic over the sample, frequentist statistics are centered over overcoming this problem.
As about your second question, I believe it is answered in the What is the difference between a population and a sample? thread. Basically, the "population" can be used in here exchangeably with "distribution" of the data. Taking samples from the population, is equivalent to having realizations of a random variable "from" its distribution. Those are statistics vs probability theory terms denoting practically the same thing.
You may also be interested in reading the Is it possible to interpret the bootstrap from a Bayesian perspective? thread, and the two blog posts The Non-parametric Bootstrap as a Bayesian Model and Easy Bayesian Bootstrap in R by Rasmus Bååth, who discusses Bayesian bootstrap in greater detail and gives many examples.
As a sidenote, Rubin (1981) himself noticed that the difference between two procedures is mostly conceptual, about how do we think about the results, as they are "quite similar inferentially", and "operationally they are very similar". The procedure differs slightly, as you sample the data with random weights (drawn from Dirichlet-uniform distribution) instead of the $1/n$ ones, as in classic bootstrap. The interpretation of the results differs because we account for the variability of the parameters, as described above.