I'm looking at the following proof:
Where:
Note: I'm new to this but I think I understand all the below variables correctly now. Mistakes are possible though.
- $f$ is an ideal function with perfect results with respect to some sample set
- $\sum_{h}$ is the hypothesis space
- $\sum_{x \in \mathcal{X}-X}$ is the sum of the entire sample space except the training data
- $P(x)$ is the probability distribution over the sample space
- The binary function in the middle is a selection function for any instance where a hypothesis is selected that is not an ideal function
- $P(h | X, L_a)$ is the probability distribution of selecting hypothesis $h$ given some function/learned model $L_a)$ used on training data $X$
The author says that given some learning function $X -> \{0,1\}$ it's function space would be $\{0,1\}^{|x|}$. He furthermore says
if $f$ is uniformly distributed, half of the predictions of $f$ on $X$ are not consistent with $h(X)$
I do not understand what is meant by this nor how he arrived at that conclusion. Why would it be the case that if $f$ is uniformly distributed that half of its predictions are not consistent with $h(X)$?
After showing this proof he then goes on to say that test error, $E_{ote}$, is independent of the chosen algorithm ($L_a$)