Actually, neither explanation is correct.
A confidence ellipse has to do with unobserved population parameters, like the true population mean of your bivariate distribution. A 95% confidence ellipse for this mean is really an algorithm with the following property: if you were to replicate your sampling from the underlying distribution many times and each time calculate a confidence ellipse, then 95% of the ellipses so constructed would contain the underlying mean. (Note that each sample would of course yield a different ellipse.)
Thus, a confidence ellipse will usually not contain 95% of the observations. In fact, as the number of observations increases, the mean will usually be better and better estimated, leading to smaller and smaller confidence ellipses, which in turn contain a smaller and smaller proportion of the actual data. (Unfortunately, some people calculate the smallest ellipse that contains 95% of their data, reminiscent of a quantile, which by itself is quite OK... but then go on to call this "quantile ellipse" a "confidence ellipse", which, as you see, leads to confusion.)
The variance of the underlying population relates to the confidence ellipse. High variance will mean that the data are all over the place, so the mean is not well estimated, so the confidence ellipse will be larger than if the variance were smaller.
Of course, we can calculate confidence ellipses also for any other population parameter we may wish to estimate. Or we could look at other confidence regions than ellipses, especially if we don't know the estimated parameter to be (asymptotically) normally distributed.
The one-dimensional analogue of the confidence ellipse is the confidence-interval, and browsing through previous questions in this tag is helpful. Our current top-voted question in this tag is particularly nice: Why does a 95% CI not imply a 95% chance of containing the mean? Most of the discussion there holds just as well for higher dimensional analogues of the one-dimensional confidence interval.