Return to Answer

edited body

Source Link

edited Aug 26, 2014 at 18:59

60.1k
13
154
199

That's a slightly confusing sentence I guess. My bad.

What "X of their answers could have been considered when the predictor was last updated Y" means is really "when we last trained our prediction engine on Y, they had X answers."

That's reported because making a prediction when we have a ton of old user data is less impressive than we had relatively little.

The actual output of the prediction engine rolls over every couple of hours, and incorporates new answers.

Another way to think of it:

There is a Stack Overflow Prediction Engine Binary™ sitting on a server somewhere
- this binary was last updated on Jan. 8th
- accordingly, it was generated based on all available user data as of Jan. 8th
Every couple of hours, the binary is run against all active users and the results cached*
Whenever a user loads the homepage, those cached results are used to customize it

As to why the binary is so old, we're in the process of moving to a new ML backend service. Once we're on it the binary will be automatically retrained, the results available for more purposes than just the homepage. This is taking a longer than I had hoped.

Since a replacement is in the works, and the actual model represented by the binary changes very slowly**very slowly, I've put off retraining for a while. It's a terribly manual process right now unfortunately.

_{*Greatly simplifying how this works, but the effect is basically the same}

That's a slightly confusing sentence I guess. My bad.

What "X of their answers could have been considered when the predictor was last updated Y" means is really "when we last trained our prediction engine on Y, they had X answers."

That's reported because making a prediction when we have a ton of old user data is less impressive than we had relatively little.

The actual output of the prediction engine rolls over every couple of hours, and incorporates new answers.

Another way to think of it:

There is a Stack Overflow Prediction Engine Binary™ sitting on a server somewhere
- this binary was last updated on Jan. 8th
- accordingly, it was generated based on all available user data as of Jan. 8th
Every couple of hours, the binary is run against all active users and the results cached*
Whenever a user loads the homepage, those cached results are used to customize it

Since a replacement is in the works, and the actual model represented by the binary changes very slowly**, I've put off retraining for a while. It's a terribly manual process right now unfortunately.

_{*Greatly simplifying how this works, but the effect is basically the same}

That's a slightly confusing sentence I guess. My bad.

What "X of their answers could have been considered when the predictor was last updated Y" means is really "when we last trained our prediction engine on Y, they had X answers."

That's reported because making a prediction when we have a ton of old user data is less impressive than we had relatively little.

The actual output of the prediction engine rolls over every couple of hours, and incorporates new answers.

Another way to think of it:

There is a Stack Overflow Prediction Engine Binary™ sitting on a server somewhere
- this binary was last updated on Jan. 8th
- accordingly, it was generated based on all available user data as of Jan. 8th
Every couple of hours, the binary is run against all active users and the results cached*
Whenever a user loads the homepage, those cached results are used to customize it

Since a replacement is in the works, and the actual model represented by the binary changes very slowly, I've put off retraining for a while. It's a terribly manual process right now unfortunately.

_{*Greatly simplifying how this works, but the effect is basically the same}

Source Link

created Aug 26, 2014 at 18:51

Kevin Montrose

60.1k
13
154
199

That's a slightly confusing sentence I guess. My bad.

What "X of their answers could have been considered when the predictor was last updated Y" means is really "when we last trained our prediction engine on Y, they had X answers."

That's reported because making a prediction when we have a ton of old user data is less impressive than we had relatively little.

The actual output of the prediction engine rolls over every couple of hours, and incorporates new answers.

Another way to think of it:

There is a Stack Overflow Prediction Engine Binary™ sitting on a server somewhere
- this binary was last updated on Jan. 8th
- accordingly, it was generated based on all available user data as of Jan. 8th
Every couple of hours, the binary is run against all active users and the results cached*
Whenever a user loads the homepage, those cached results are used to customize it

Since a replacement is in the works, and the actual model represented by the binary changes very slowly**, I've put off retraining for a while. It's a terribly manual process right now unfortunately.

_{*Greatly simplifying how this works, but the effect is basically the same}