In Platt's 1999 paper on turning support vector machine output into a probabilistic score, he says
Bayes rule on two exponentials suggests using a parametric form of a sigmoid
where he cites this paper. I'm still getting up to speed on Bayesian probability, so could someone help me do the math (or point me towards a reference that does) so I can see how the posterior of two exponentials suggests a sigmoid?
To get started, I've tried working through this thesis and looking at specific questions like this one, but I must be missing something fundamental because I can't even get started on the problem.