Forecasting demand using statistics: Which method is best?

Most, if not all, performance analysts would agree that effective use of appropriate statistical treatments is important in drawing sensible insights from the mass of data generated by modern performance tools.

I wanted to comment on Arun Kejariwal’s interesting session at Velocity last week, but before I get to that, a quick ‘2 minute hate’ on my personal bête noir, the arithmetic mean. Given that ‘live’ performance data is very rarely / never normally distributed, the mean is a very inappropriate metric.

A more ‘robust’ index of central tendency such as the median provides a far better ongoing measure of comparative performance in situations where data is skewed and/or contains outliers (that is, always!). As this is hardly an earth-shattering insight, why do so many vendors (I won’t name and shame, but we all know who they are – it’s a looong list) persist in offering the mean as their standard response metric? Good to see some more recent entrants such as SOASTA mPulse using the median instead.

Now I have got the bit between my teeth, on the subject of ‘dirty’ (i.e. real world) data, it would be good to see the MAD (median absolute deviation) replacing standard deviation, for similar reasons to the mean/median. Personally, I still retain the standard deviation as a reference, though, as comparison with the MADe (estimated Standard Deviation assuming normality of distribution) provides a handy ‘quick reference’ as the amount of ‘skew’ in the data from day-to-day.

But back to the plot – the Twitter team (ably represented at Velocity by Arun) discussed the use of statistics in demand forecasting. This is a fascinating area. It was interesting to see the various treatments applied to form fitting of current/ historic data (Holt Winters and ARIMA came top of the heap by the way in terms of accuracy of modelling of seasonality in the data sets, if anyone is interested).

My point of departure was in the Twitter team’s attempts to derive accurate predictive forecasting.

In short, their efforts only bore any semblance of robust utility if the input data was aggressively ‘cleaned’ to remove anomalies. Even then, attempts at anything beyond short term prediction tended to be snookered by unacceptably high standard errors.

In this situation, the question is therefore ‘what is an anomaly’? This question may just be slightly easier to answer for the operations team at Twitter, given their monster data volumes, but I suggest that it is going to rapidly become an unproductive guessing game for the vast majority of companies.

From the data presented, I would suggest that forecasting, at least over the relatively short term, will be equally well or better accomplished with a pencil and ruler (i.e. by extrapolating a best fit curve) than by any amount of statistical jiggery-pokery.

Too radical?

Happy number crunching!

Dirty is healthy – embracing uncertainty in web performance monitoring

Response time, availability and consistency are the “holy trinity” of external performance monitoring. Effective operations teams are required to provide assurance of all three metrics (and many more besides, of course).

Clean room testing – the best case scenario

There is a superficially attractive logic that argues that the primary requisite for understanding and managing these metrics is test tooling with optimised “clean room” characteristics. These would include:

  • Tier 1 ISP connections
  • “Unlimited” bandwidth (or defined  connectivity from such a connection)
  • “Industrial strength” high capacity test nodes
  • Highly instrumented / modified test agents

Certainly, the ability to run replicable comparative testing of this nature has its place. Indeed, it is far preferable to the approach of some “bargain basement” tools running tests across tertiary ISP connections from local datacentres with little or no understanding of the above characteristics.

It is not difficult to accept that “near edge” clean room testing has value – in fact I regard it as essential in many areas, including:

  • Application performance baselining across the business demand cycle
  • Competitive benchmarking
  • KPI and third party SLA management

Put this together with the concept that there is no point in identifying issues (such as ISP Peerage) that are outside ones span of immediate control, and we are all done aren���t we? To quote Evelyn Waugh, “up to a point, Lord Copper.” This approach would be perfectly adequate if web delivery was made in such consistent conditions.

Performance in the real world

Unfortunately, as we all know, the opposite is the case – and the busier your site, the more this applies. The big grocers, for example, find themselves getting web requests from IE5 and other browsers that time has forgotten, not to mention every kind of interactive user device. Add mobile to the mix and the situation is incrementally worse – add hundreds of user devices and wireless connectivity into the mix for good measure.

The key to all this is taking a “knowledge is power” approach based on business, not just operational risk. So by all means run near edge testing, but understand what your customers are experiencing too. This means getting as close to them as you possibly can.

Consider:

  • Browser versions
  • PC system characteristics
  • Tertiary (retail) ISPs
  • Real world  connection speeds (refer to Ofcom – they will be a lot lower than you may think through reading the adverts)
  • Wireless carrier / Wi-Fi provision
  • Mobile device characteristics including processor, memory state, battery stats & signal strength

….and plenty more – basically, anything that can affect application performance (and therefore such “minor matters” as digital revenue and brand perception – probably will.

Realistic approach to testing

A testing approach (supported by appropriate tooling) that provides visibility to these areas (not necessarily all the time, but certainly frequently enough to spot major issues and define the “offsets” from your clean room tests) will pay dividends in areas including:

  • Peerage issues (often “silent” but affecting significant proportions of your entire customer base) – “fixed wire” ISP and wireless carriers
  • Client side problems (especially where RIA 2.0 technology – Flex, AJAX etc. and / or multimedia content is involved)
  • Mobile device-centric issue – particularly with regards to native mobile applications
  • Key client intranet/extranet performance
  • Provision of detailed objective data to support demands for resolution to third parties
  • And so on

Understanding will enable:

  • Issue identification
  • Prioritisation of intervention (on basis of revenue risk, etc.) and
  • Validation of resolution

…and the “anti-promotion” phone call from the CEO who has just been cornered for half an hour in the golf club by someone with website “issues” will be a thing of the past!

Final thought – end user testing requires different approaches to test design in order to maximise business value.

Webinar: The business benefits of good end-user experience

In this webinar, experienced Performance consultant Larry Haig explains why end user experience is so important to retail websites and systems, using clear examples and research as evidence. Larry discusses the elements that contribute to the difficulty businesses have in ensuring a high quality experience for end users, from the growing variety of devices and connections used by users through to website performance and consistency. He continues by outlining the risk to profitability from poor user experience. This is of particular interest to anyone working in online retail or with a transactional website. The webinar consists of a short presentation followed by a Q&A session.