0
$\begingroup$

Can anyone explain this statement?

"Firstly, let’s define the input space as X (sensory observations) and the label space as Y (semantic categories). The data distribution is represented by the joint distribution P(X, Y ) over the space X ×Y. Distribution shift can occur in either the marginal distribution P(X), or both P(Y ) and P(X). Note that shift in P(Y ) naturally triggers shift in P(X)."

I thought that shift in P(X) naturally triggers shift in P(Y) not the other way around since P(Y) is a function of P(X)

$\endgroup$

1 Answer 1

1
$\begingroup$

You are referencing this paper, which covers distribution shift in machine learning. The paragraph in question is a statement about causality, it simply comes from this observation: a shift in the proportion of labels can only occur if there is a corresponding shift in input distribution.

For example, consider a dataset consisting of pictures of animals, which we call $X$, and their names/labels (i.e. cat, dog, etc), which we call $Y$. You train a classifier to predict $Y|X$ using a dataset $\mathcal{D}$. In practice, your classifier then is served as a web service, where people upload pictures of animals and your model returns a label. Let's look at an example of a distribution shift in $Y$; this might mean that the label 'dog' shows up in higher proportion in practice than in your original dataset, i.e. $\mathbb{P}_{\mathcal{Q}}(Y=\text{'dog'}) > \mathbb{P}_{\mathcal{D}}(Y = \text{'dog'})$ (where $\mathbb{P}_\mathcal{Q}$ is the distribution induced by your web service and $\mathbb{P}_\mathcal{D}$ is the distribution induced by your original dataset). However, the only way for such a shift to occur is if people upload a higher proportion of pictures of dogs than in your original dataset, i.e., $\mathbb{P}_{\mathcal{Q}}(\{ \text{all images of dogs}\}) > \mathbb{P}_{\mathcal{D}}(\{\text{all images of dogs}\})$ So any change in the proportion of labels can only occur by affecting the distribution of images.

However, we can affect the distribution of images without affecting the the proportion of labels. For example, if we apply a transformation to our dataset, we would affect the probability induced over $X$ but not over $Y$. E.G. if we turn all images black-and-white, we rotate them, take out a chunk of the corners; this doesn't change the true label of the images but it does change the distribution of images the classifier sees. Another example would be if people using your online service start sending in many pictures of Huskies when you only trained your classifier using pictures of German shepherds. The distribution over labels wouldn't change (since they're both dogs of course) but the distribution over the set of images does.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .