Suppose I have a data set $\{(x^i, t^i)\}_{i =1, \ldots, n}$ generated i.i.d. $t^{i} \in \{1, -1\}$ are binary targets.
We would like to run the logistic regression, which is based on maximizing the joint likelihood function, (conditional maximum likelihood estimation),
$$\theta^\star = \arg\max_\theta \thinspace p(t^1, \ldots, t^n| x^1, \ldots, x^n; \theta)$$
Using the iid. assumption, we get,
$$\theta^\star = \arg\max_\theta \thinspace \prod\limits_{i = 1}^n p(t^i| x^i; \theta)$$
which when we take the log, we obtain $$\theta^\star = \arg\max_\theta \thinspace \sum\limits_{i = 1}^n \log( p(t^i| x^i; \theta))$$
My question is that, why do we wish to maximize the parameter $\theta$ for the joint likelihood, when we are actually interested in is maximizing the probability that for all $i$, given $x^i$, we obtain $t^i$?
In other words, we need to solve the following problem instead,
$$ \theta^\star = \arg\max_\theta \thinspace p(t^i|x^i; \theta), \forall i = 1,\ldots, n$$
Why don't we solve this problem instead? Is it ill-posed?