Most, if not all, performance analysts would agree that effective use of appropriate statistical treatments is important in drawing sensible insights from the mass of data generated by modern performance tools.
I wanted to comment on Arun Kejariwal’s interesting session at Velocity last week, but before I get to that, a quick ‘2 minute hate’ on my personal bête noir, the arithmetic mean. Given that ‘live’ performance data is very rarely / never normally distributed, the mean is a very inappropriate metric.
A more ‘robust’ index of central tendency such as the median provides a far better ongoing measure of comparative performance in situations where data is skewed and/or contains outliers (that is, always!). As this is hardly an earth-shattering insight, why do so many vendors (I won’t name and shame, but we all know who they are – it’s a looong list) persist in offering the mean as their standard response metric? Good to see some more recent entrants such as SOASTA mPulse using the median instead.
Now I have got the bit between my teeth, on the subject of ‘dirty’ (i.e. real world) data, it would be good to see the MAD (median absolute deviation) replacing standard deviation, for similar reasons to the mean/median. Personally, I still retain the standard deviation as a reference, though, as comparison with the MADe (estimated Standard Deviation assuming normality of distribution) provides a handy ‘quick reference’ as the amount of ‘skew’ in the data from day-to-day.
But back to the plot – the Twitter team (ably represented at Velocity by Arun) discussed the use of statistics in demand forecasting. This is a fascinating area. It was interesting to see the various treatments applied to form fitting of current/ historic data (Holt Winters and ARIMA came top of the heap by the way in terms of accuracy of modelling of seasonality in the data sets, if anyone is interested).
My point of departure was in the Twitter team’s attempts to derive accurate predictive forecasting.
In short, their efforts only bore any semblance of robust utility if the input data was aggressively ‘cleaned’ to remove anomalies. Even then, attempts at anything beyond short term prediction tended to be snookered by unacceptably high standard errors.
In this situation, the question is therefore ‘what is an anomaly’? This question may just be slightly easier to answer for the operations team at Twitter, given their monster data volumes, but I suggest that it is going to rapidly become an unproductive guessing game for the vast majority of companies.
From the data presented, I would suggest that forecasting, at least over the relatively short term, will be equally well or better accomplished with a pencil and ruler (i.e. by extrapolating a best fit curve) than by any amount of statistical jiggery-pokery.
Too radical?
Happy number crunching!