SlideShare a Scribd company logo
Robust Automated Forecasting
In Python & R
Pranav Bahl, Data Scientist
Jonathan Stacks, DevOps Engineer
Time Series Forecasting
Time Series: A series of data points indexed in time order, spaced at equal time
intervals. It consists of two variables, time and values.
Overview
Goal: Demonstrate how to make millions of [robust] forecasts in an automated fashion
● Define the problem
● Retrieve and preprocess data
○ Profile
○ Transform
○ Detect outliers/anomalies
● Create forecast models employing multiple strategies and parameter combinations
○ Using Python
○ Using R
● Evaluate models with contextual evaluation metrics (and meta-metrics)
● Discuss how to choose the most appropriate hardware
Defining the problem
● Forecasts are used throughout
our risk management system
○ Bias towards risk aversion
● > 250,000 unique time series
○ Growing at 2x each year
● Elastic compute capacity
● Runtime <= 5 hours
● Cost <= $200/day
● Reduce error by 10%
○ Based on cumulative forecast % error
Runtime and cost calculation
Because the number of unique time series to be processed change over time so the idea is to adjust
number of servers to get the job done in desired time.
Time series profiling
● Ensure index is a timestamp
● Check for completeness of panel
● Remove leap days
● Make sure data has at least one complete season
● Truncate data to in favor of complete seasons
Handling missing values
● Impute using…
○ Descriptive statistics (e.g. mean or median)
○ Interpolation (e.g. linear or polynomial)
○ Extrapolation (e.g. training a linear model)
Outlier/anomaly detection
Outlier: An observation that lies an abnormal distance from other values in a random
sample from a population.
- Boxplot
- Time-series decomposition routine
Anomaly: Illegitimate data point that’s generated by a different process than whatever
generated the rest of the data.
- Forecasting
- Robust Principal Component Analysis (RPCA)
Outlier/anomaly detection
● Regression
○ [Python] StatsModels (OLS, polynomial)
○ [Python/R] XGBoost (gradient boosted regression trees)
● ARMA / ARIMA / SARMIA (Autoregressive Moving-Average)
○ [R] Forecast
○ [Python] StatsModels
○ [Python] Pyramid
● Exponential smoothing
○ [R] Forecast (Holt-Winters)
● Structural Models / State space models
○ [R] BSTS (Bayesian Structural Time Series)
○ [Python] Pyflux
○ [Python/R] Prophet
Forecasting library landscape
Evaluation Metrics
Evaluation metrics are used to measure the quality of forecast. It is also used to
compare different strategies.There are two types of evaluation metrics:
Scale dependent: The metrics which requires all the compared time series be on the same scale.
Eg:
Scale independent: The metrics used to compare forecast performance amongst different data sets
Eg:
Bringing Down Runtime, Scale and Cost
Runtime optimization: Run different experiments to narrow down poor
performing strategies or transformations
Cost optimization: Optimize the cost function by experimenting different server
options
Runtime optimization
Initial Approach
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
First experiment: Remove unnecessary metrics
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
First experiment: Remove unnecessary metrics
First experiment: Remove unnecessary metrics
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
First experiment: Result
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
Second experiment: Forecast model selection
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
Second experiment: Forecast model selection
● Choose distinct but meaningful parameters for all strategies
● Initial experiment with default settings
○ Exception: reduced number of simulations for faster runtime
● Compare strategies across filtered evaluation metrics
Second experiment: Forecast model selection
Second experiment: Forecast model selection
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
Second experiment: Result
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA
Third experiment: Auto ARIMA vs PyARIMA
Software
Hardware
TBATS ProphetPyARIMA
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA
Third experiment: Auto ARIMA vs PyARIMA
● Choose best Arima Implementation
○ Since Python’s Auto Arima reflect R’s implementation
● Compare error differences
○ Hold implementation with better performance
Error differences, where Python Auto ARIMA wins
50% py_auto_arima_log
50% py_auto_arima
Third experiment: Auto ARIMA vs PyARIMA
Software
Hardware
TBATS ProphetPyARIMA
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA
Third experiment: Result
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA
Added another transformation
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fourth Experiment: Python Version Testing
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fourth Experiment: Python Version Testing
● Execute all strategies on both Python versions
○ Run over different server types
● Compare different processing times
Fourth Experiment: Python Version Testing
Fourth Experiment: Python Version Testing
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fourth Experiment: Result
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fifth Experiment: Hardware Optimization
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fifth Experiment: Hardware Optimization
Goal: Run all strategies with different
transformation across different
servers. Pick best server based on
time(cumulative) consumption.
Fifth Experiment: Hardware Optimization
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Fifth Experiment: Result
Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
Determine bottlenecks:
● Determine if any strategy is consuming considerably more time compared to
others
● Try optimizing overhead created by rpy2 while spinning up R kernel
Sixth Experiment: How to improve processing time?
Sixth Experiment: How to improve processing time?
Let’s look at the time distribution of different strategies
Auto_ARIMA TBATS Prophet
Under 2nd
standard deviation
Across all samples
Result: Apply timeout duration of 300 secs and analyse the loss of forecasting
strength
Sixth Experiment: How to improve processing time?
● Win distribution where Auto_ARIMA exceeds timeout threshold
○ time consuming accounts: 26/1000
○ time consuming accounts with Auto Arima as winning strategy: 14/26
Sixth Experiment: How to improve processing time?
● How much forecasting strength did we lose
○ Is ARIMA winning with huge margins
Sixth Experiment: How to improve processing time?
How does the time threshold help us:
● To gain 10% speedup
● Maintain acceptable loss of forecasting strength
Sixth Experiment: How to improve processing time?
Rpy2 ineffectiveness: It is not feasible to timeout the process when using rpy2
Solution: Create individual R scripts for each R strategy and run them as a
sub-process in Python
Sixth Experiment: How to improve processing time?
Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
BSTS
Add BSTS(Bayesian Structural Time Series) strategy
Add BSTS(Bayesian Structural Time Series) strategy
Concluding Pipeline
Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
BSTS
Scale + Cost Optimization
● Portable environments
● Immutable image
● Fast & Easy to deploy
across a cluster
● Same code in production
as in local development
Architecture
The Basics:
● AWS elastic infrastructure
● Redshift data warehouse
● Luigi task orchestration
● Celery queue to distribute work
● Docker on c4.8xlarge(s)
● Amazon Linux OS
● Anaconda distro
○ Python 3.6
○ R 3.4
● Apache Parquet on S3
ELERY
S3
ECS
Redshift
Parquet
EC2
Concurrency Implementation
Cluster Stats
● 28 Instances (c4.8xlarge)
● 1000 workers
● ~2.6 Million forecasts
● ~5.5 hours
On Demand Spot
Cost per hour $1.59 $0.60
Cost per run $267.00 $100.80
Conclusion
New forecasting pipeline resulted in reduction of forecast error
Future Improvements
● Run the experiment if any of the library experience upgrades
● Periodically look for new time series strategies
● Add regression models
● Investigate AWS Batch or Kubernetes.
References
https://c2fo.com/
https://www.otexts.org/fpp
https://arxiv.org/pdf/1302.6613.pdf
https://sites.google.com/site/stevethebayesian/googlepageforstevenlscott/course-a
nd-seminar-materials/bsts-bayesian-structural-time-series
Questions?
@pranav_bahl

More Related Content

Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R

  • 1. Robust Automated Forecasting In Python & R Pranav Bahl, Data Scientist Jonathan Stacks, DevOps Engineer
  • 2. Time Series Forecasting Time Series: A series of data points indexed in time order, spaced at equal time intervals. It consists of two variables, time and values.
  • 3. Overview Goal: Demonstrate how to make millions of [robust] forecasts in an automated fashion ● Define the problem ● Retrieve and preprocess data ○ Profile ○ Transform ○ Detect outliers/anomalies ● Create forecast models employing multiple strategies and parameter combinations ○ Using Python ○ Using R ● Evaluate models with contextual evaluation metrics (and meta-metrics) ● Discuss how to choose the most appropriate hardware
  • 4. Defining the problem ● Forecasts are used throughout our risk management system ○ Bias towards risk aversion ● > 250,000 unique time series ○ Growing at 2x each year ● Elastic compute capacity ● Runtime <= 5 hours ● Cost <= $200/day ● Reduce error by 10% ○ Based on cumulative forecast % error
  • 5. Runtime and cost calculation Because the number of unique time series to be processed change over time so the idea is to adjust number of servers to get the job done in desired time.
  • 6. Time series profiling ● Ensure index is a timestamp ● Check for completeness of panel ● Remove leap days ● Make sure data has at least one complete season ● Truncate data to in favor of complete seasons
  • 7. Handling missing values ● Impute using… ○ Descriptive statistics (e.g. mean or median) ○ Interpolation (e.g. linear or polynomial) ○ Extrapolation (e.g. training a linear model)
  • 8. Outlier/anomaly detection Outlier: An observation that lies an abnormal distance from other values in a random sample from a population. - Boxplot - Time-series decomposition routine Anomaly: Illegitimate data point that’s generated by a different process than whatever generated the rest of the data. - Forecasting - Robust Principal Component Analysis (RPCA)
  • 10. ● Regression ○ [Python] StatsModels (OLS, polynomial) ○ [Python/R] XGBoost (gradient boosted regression trees) ● ARMA / ARIMA / SARMIA (Autoregressive Moving-Average) ○ [R] Forecast ○ [Python] StatsModels ○ [Python] Pyramid ● Exponential smoothing ○ [R] Forecast (Holt-Winters) ● Structural Models / State space models ○ [R] BSTS (Bayesian Structural Time Series) ○ [Python] Pyflux ○ [Python/R] Prophet Forecasting library landscape
  • 11. Evaluation Metrics Evaluation metrics are used to measure the quality of forecast. It is also used to compare different strategies.There are two types of evaluation metrics: Scale dependent: The metrics which requires all the compared time series be on the same scale. Eg: Scale independent: The metrics used to compare forecast performance amongst different data sets Eg:
  • 12. Bringing Down Runtime, Scale and Cost Runtime optimization: Run different experiments to narrow down poor performing strategies or transformations Cost optimization: Optimize the cost function by experimenting different server options
  • 14. Initial Approach Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE MAE MSE NMSE SMSE SSE Scale dependent Scale independent MPE MAPE SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 15. First experiment: Remove unnecessary metrics Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE MAE MSE NMSE SMSE SSE Scale dependent Scale independent MPE MAPE SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 16. First experiment: Remove unnecessary metrics
  • 17. First experiment: Remove unnecessary metrics Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE MAE MSE NMSE SMSE SSE Scale dependent Scale independent MPE MAPE SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 18. First experiment: Result Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 19. Second experiment: Forecast model selection Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 20. Second experiment: Forecast model selection ● Choose distinct but meaningful parameters for all strategies ● Initial experiment with default settings ○ Exception: reduced number of simulations for faster runtime ● Compare strategies across filtered evaluation metrics
  • 21. Second experiment: Forecast model selection
  • 22. Second experiment: Forecast model selection Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA Pyflux
  • 23. Second experiment: Result Software Hardware C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log TBATS ProphetPyARIMAAuto ARIMA
  • 24. Third experiment: Auto ARIMA vs PyARIMA Software Hardware TBATS ProphetPyARIMA C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log Auto ARIMA
  • 25. Third experiment: Auto ARIMA vs PyARIMA ● Choose best Arima Implementation ○ Since Python’s Auto Arima reflect R’s implementation ● Compare error differences ○ Hold implementation with better performance Error differences, where Python Auto ARIMA wins 50% py_auto_arima_log 50% py_auto_arima
  • 26. Third experiment: Auto ARIMA vs PyARIMA Software Hardware TBATS ProphetPyARIMA C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log Auto ARIMA
  • 27. Third experiment: Result Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Default Log Auto ARIMA
  • 28. Added another transformation Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 29. Fourth Experiment: Python Version Testing Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 30. Fourth Experiment: Python Version Testing ● Execute all strategies on both Python versions ○ Run over different server types ● Compare different processing times
  • 31. Fourth Experiment: Python Version Testing
  • 32. Fourth Experiment: Python Version Testing Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 33. Fourth Experiment: Result Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 34. Fifth Experiment: Hardware Optimization Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 35. Fifth Experiment: Hardware Optimization Goal: Run all strategies with different transformation across different servers. Pick best server based on time(cumulative) consumption.
  • 36. Fifth Experiment: Hardware Optimization Software Hardware TBATS Prophet C4 R4 X1 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 37. Fifth Experiment: Result Software Hardware TBATS Prophet C4 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox
  • 38. Determine bottlenecks: ● Determine if any strategy is consuming considerably more time compared to others ● Try optimizing overhead created by rpy2 while spinning up R kernel Sixth Experiment: How to improve processing time?
  • 39. Sixth Experiment: How to improve processing time?
  • 40. Let’s look at the time distribution of different strategies Auto_ARIMA TBATS Prophet Under 2nd standard deviation Across all samples Result: Apply timeout duration of 300 secs and analyse the loss of forecasting strength Sixth Experiment: How to improve processing time?
  • 41. ● Win distribution where Auto_ARIMA exceeds timeout threshold ○ time consuming accounts: 26/1000 ○ time consuming accounts with Auto Arima as winning strategy: 14/26 Sixth Experiment: How to improve processing time?
  • 42. ● How much forecasting strength did we lose ○ Is ARIMA winning with huge margins Sixth Experiment: How to improve processing time?
  • 43. How does the time threshold help us: ● To gain 10% speedup ● Maintain acceptable loss of forecasting strength Sixth Experiment: How to improve processing time?
  • 44. Rpy2 ineffectiveness: It is not feasible to timeout the process when using rpy2 Solution: Create individual R scripts for each R strategy and run them as a sub-process in Python Sixth Experiment: How to improve processing time?
  • 45. Software Hardware TBATS Prophet C4 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox BSTS Add BSTS(Bayesian Structural Time Series) strategy
  • 46. Add BSTS(Bayesian Structural Time Series) strategy
  • 47. Concluding Pipeline Software Hardware TBATS Prophet C4 Amazon EC2 Python 3 Transformations Strategies Languages MFE RMSE Scale dependent Scale independent SMAPE MASE Metrics Auto ARIMA Default Log BoxCox BSTS
  • 48. Scale + Cost Optimization
  • 49. ● Portable environments ● Immutable image ● Fast & Easy to deploy across a cluster ● Same code in production as in local development
  • 50. Architecture The Basics: ● AWS elastic infrastructure ● Redshift data warehouse ● Luigi task orchestration ● Celery queue to distribute work ● Docker on c4.8xlarge(s) ● Amazon Linux OS ● Anaconda distro ○ Python 3.6 ○ R 3.4 ● Apache Parquet on S3 ELERY S3 ECS Redshift Parquet EC2
  • 52. Cluster Stats ● 28 Instances (c4.8xlarge) ● 1000 workers ● ~2.6 Million forecasts ● ~5.5 hours On Demand Spot Cost per hour $1.59 $0.60 Cost per run $267.00 $100.80
  • 53. Conclusion New forecasting pipeline resulted in reduction of forecast error
  • 54. Future Improvements ● Run the experiment if any of the library experience upgrades ● Periodically look for new time series strategies ● Add regression models ● Investigate AWS Batch or Kubernetes.