Skip to main content
edited body
Source Link
Kevin Montrose
  • 60.1k
  • 13
  • 154
  • 199

That's a slightly confusing sentence I guess. My bad.

What "X of their answers could have been considered when the predictor was last updated Y" means is really "when we last trained our prediction engine on Y, they had X answers."

That's reported because making a prediction when we have a ton of old user data is less impressive than we had relatively little.

The actual output of the prediction engine rolls over every couple of hours, and incorporates new answers.

Another way to think of it:

  • There is a Stack Overflow Prediction Engine Binary™ sitting on a server somewhere
    • this binary was last updated on Jan. 8th
    • accordingly, it was generated based on all available user data as of Jan. 8th
  • Every couple of hours, the binary is run against all active users and the results cached*
  • Whenever a user loads the homepage, those cached results are used to customize it

As to why the binary is so old, we're in the process of moving to a new ML backend service. Once we're on it the binary will be automatically retrained, the results available for more purposes than just the homepage. This is taking a longer than I had hoped.

Since a replacement is in the works, and the actual model represented by the binary changes very slowly**very slowly, I've put off retraining for a while. It's a terribly manual process right now unfortunately.

*Greatly simplifying how this works, but the effect is basically the same

That's a slightly confusing sentence I guess. My bad.

What "X of their answers could have been considered when the predictor was last updated Y" means is really "when we last trained our prediction engine on Y, they had X answers."

That's reported because making a prediction when we have a ton of old user data is less impressive than we had relatively little.

The actual output of the prediction engine rolls over every couple of hours, and incorporates new answers.

Another way to think of it:

  • There is a Stack Overflow Prediction Engine Binary™ sitting on a server somewhere
    • this binary was last updated on Jan. 8th
    • accordingly, it was generated based on all available user data as of Jan. 8th
  • Every couple of hours, the binary is run against all active users and the results cached*
  • Whenever a user loads the homepage, those cached results are used to customize it

As to why the binary is so old, we're in the process of moving to a new ML backend service. Once we're on it the binary will be automatically retrained, the results available for more purposes than just the homepage. This is taking a longer than I had hoped.

Since a replacement is in the works, and the actual model represented by the binary changes very slowly**, I've put off retraining for a while. It's a terribly manual process right now unfortunately.

*Greatly simplifying how this works, but the effect is basically the same

That's a slightly confusing sentence I guess. My bad.

What "X of their answers could have been considered when the predictor was last updated Y" means is really "when we last trained our prediction engine on Y, they had X answers."

That's reported because making a prediction when we have a ton of old user data is less impressive than we had relatively little.

The actual output of the prediction engine rolls over every couple of hours, and incorporates new answers.

Another way to think of it:

  • There is a Stack Overflow Prediction Engine Binary™ sitting on a server somewhere
    • this binary was last updated on Jan. 8th
    • accordingly, it was generated based on all available user data as of Jan. 8th
  • Every couple of hours, the binary is run against all active users and the results cached*
  • Whenever a user loads the homepage, those cached results are used to customize it

As to why the binary is so old, we're in the process of moving to a new ML backend service. Once we're on it the binary will be automatically retrained, the results available for more purposes than just the homepage. This is taking a longer than I had hoped.

Since a replacement is in the works, and the actual model represented by the binary changes very slowly, I've put off retraining for a while. It's a terribly manual process right now unfortunately.

*Greatly simplifying how this works, but the effect is basically the same

Source Link
Kevin Montrose
  • 60.1k
  • 13
  • 154
  • 199

That's a slightly confusing sentence I guess. My bad.

What "X of their answers could have been considered when the predictor was last updated Y" means is really "when we last trained our prediction engine on Y, they had X answers."

That's reported because making a prediction when we have a ton of old user data is less impressive than we had relatively little.

The actual output of the prediction engine rolls over every couple of hours, and incorporates new answers.

Another way to think of it:

  • There is a Stack Overflow Prediction Engine Binary™ sitting on a server somewhere
    • this binary was last updated on Jan. 8th
    • accordingly, it was generated based on all available user data as of Jan. 8th
  • Every couple of hours, the binary is run against all active users and the results cached*
  • Whenever a user loads the homepage, those cached results are used to customize it

As to why the binary is so old, we're in the process of moving to a new ML backend service. Once we're on it the binary will be automatically retrained, the results available for more purposes than just the homepage. This is taking a longer than I had hoped.

Since a replacement is in the works, and the actual model represented by the binary changes very slowly**, I've put off retraining for a while. It's a terribly manual process right now unfortunately.

*Greatly simplifying how this works, but the effect is basically the same