SlideShare a Scribd company logo
Analyzing Oracle Performance Using Time Series Models Chen (Gwen) Shapirahttp://prodlife.wordpress.com
Why?Abnormal DataChangesTrendsSLAs
SeeTechniquesUse CasesReal Data
Techniques
Database Performance Analysis with Time Series
Trend
Trend
Moving Average Trend
Database Performance Analysis with Time Series
Remove Trend
Seasonality
Database Performance Analysis with Time Series
Database Performance Analysis with Time Series
Seasonal Effect
Components
More AutoCorrelation
Xt= 0.33Xt-1 +         0.07Xt-2 –         0.09Xt-3+ e
Test Model
Use Cases
Fake Incident
Detect ByRemove trendRemove SeasonalityMark “normal data”What’s left?
Spot the Incident
“I have seen the future and it is very much like the present, only longer”KehlogAlbran
Exponential SmoothingCalculate moving average of futureAdd seasonality
Database Performance Analysis with Time Series
AutoCorrelation	Use the model:Xt = aXt-1…To calculate Xt+1,Xt+2…
Database Performance Analysis with Time Series
Real Data 1:Redo Blocks per Hour
Holiday
Seasonality
Abnormal Data
Real Data 2:CPU on DB Server
Database Performance Analysis with Time Series
Seasonality?
Partial AutoCorrelation
Check Fit of Model
Prediction
ConclusionsUse moving average to describe trendLook for seasonalityPredict with Exponential SmoothingAutoCorrelation?Seasonality aware monitoring
Questions?

More Related Content

Database Performance Analysis with Time Series

Editor's Notes

  1. Time Series – Data that is collected sequentially, usually in regular intervals.Time series are all around us – weather, stock, cpu, disk space…
  2. Recognize abnormal data and send alerts Recognize changes and be proactive Analyze long term trends for planning Set Realistic SLAs
  3. One question we’ll keep asking ourselves: Which techniques are really useful?
  4. All kinds of data issues can prevent analysisYou can and sometimes should fix the data so analysis is possibleReplace missing data with average values (or maximum values where makes sense)Remove outliers when it makes sense.Analyze two sides of discontinuity separately
  5. Linear trend. Easy to fit and use, but rarely makes sense in real life
  6. Moving Average requires picking a window size and weights.Small window – matches data better, but may include noiseLarge window – more of a general trend, but will contain a delay
  7. Remove trend to allow analyzing other components.
  8. 50 degrees Fahrenheit is cold for August but hot for January. How about 60% CPU? Is it always OK or always a problem?
  9. Reminder: Correlation is a measure of the strength of the relation between two variables. How much do the variables change together?
  10. How is data in our series correlates to itself? We see strong correlation between data points 24 hours away.
  11. Average CPU for each hour. Similar to those average temperatures for each month charts you sometimes see in tour guides.
  12. One chart to rule them all – data, trend, seasonality and all the rest.
  13. “All the rest” is not completely random – there is still some auto-correlation. Data correlates to points with a lag of one and two.
  14. R used the auto-correlations to model the data
  15. We test the model.We can see that the residuals no longer have auto-correlationand the statistical test for the fit shows that the result is likely not random.
  16. I added couple of hours with high CPU here. Can you spot them?
  17. After removing seasonality and average, we can clearly see that data point that is an outlier. It stands out.
  18. Calculate moving average of future by adding the moving average for the last 20 points as an additional point. Then using the last 19 real points and the new one to calculate another point… Obviously this gets less accurate the more you do it.Adding seasonality is a matter of adding the hourly average to the appropriate new points.
  19. Red – Match the model to existing dataBlue – Predicted dataGreen – 99% probability that we will not get data outside these lines
  20. A bit like moving average but with very specific weights.
  21. Blue – Predicted dataGreen – 99% probability that we will not get data outside these lines
  22. The redo data is very noisy, but adding a moving average trend allows us to see a point where redo generation drops. This happened to be Dec 20 where many users left for vacation.
  23. Correlation every 6 hours and stronger correlation every 24. These are the times we recalculate materialized views. Few views every 6 hours and a bunch every 24.
  24. Removing the seasonality allows us to notice abnormal data. Worth investigating – what was running at that time? Is it likely to happen again?
  25. Not exactly trend, but we do have changing levels of data.
  26. There are periodic correlations but they are not regular, so it is not seasonality.This graph does indicate extremely strong auto-correlation
  27. Partial Autocorrelation graph. This is similar to autocorrelation, but when we calculate auto-correlation for lag 2, we remove the correlation already explained by lag 1 and so on.Using this graph we can see auto-correlation up to lag 17. Once the CPU climbs, it may take over 3 hours until it is back to normal!
  28. Checking that the AR(17) model fits.