Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R
- 2. Time Series Forecasting
Time Series: A series of data points indexed in time order, spaced at equal time
intervals. It consists of two variables, time and values.
- 3. Overview
Goal: Demonstrate how to make millions of [robust] forecasts in an automated fashion
● Define the problem
● Retrieve and preprocess data
○ Profile
○ Transform
○ Detect outliers/anomalies
● Create forecast models employing multiple strategies and parameter combinations
○ Using Python
○ Using R
● Evaluate models with contextual evaluation metrics (and meta-metrics)
● Discuss how to choose the most appropriate hardware
- 4. Defining the problem
● Forecasts are used throughout
our risk management system
○ Bias towards risk aversion
● > 250,000 unique time series
○ Growing at 2x each year
● Elastic compute capacity
● Runtime <= 5 hours
● Cost <= $200/day
● Reduce error by 10%
○ Based on cumulative forecast % error
- 5. Runtime and cost calculation
Because the number of unique time series to be processed change over time so the idea is to adjust
number of servers to get the job done in desired time.
- 6. Time series profiling
● Ensure index is a timestamp
● Check for completeness of panel
● Remove leap days
● Make sure data has at least one complete season
● Truncate data to in favor of complete seasons
- 7. Handling missing values
● Impute using…
○ Descriptive statistics (e.g. mean or median)
○ Interpolation (e.g. linear or polynomial)
○ Extrapolation (e.g. training a linear model)
- 8. Outlier/anomaly detection
Outlier: An observation that lies an abnormal distance from other values in a random
sample from a population.
- Boxplot
- Time-series decomposition routine
Anomaly: Illegitimate data point that’s generated by a different process than whatever
generated the rest of the data.
- Forecasting
- Robust Principal Component Analysis (RPCA)
- 10. ● Regression
○ [Python] StatsModels (OLS, polynomial)
○ [Python/R] XGBoost (gradient boosted regression trees)
● ARMA / ARIMA / SARMIA (Autoregressive Moving-Average)
○ [R] Forecast
○ [Python] StatsModels
○ [Python] Pyramid
● Exponential smoothing
○ [R] Forecast (Holt-Winters)
● Structural Models / State space models
○ [R] BSTS (Bayesian Structural Time Series)
○ [Python] Pyflux
○ [Python/R] Prophet
Forecasting library landscape
- 11. Evaluation Metrics
Evaluation metrics are used to measure the quality of forecast. It is also used to
compare different strategies.There are two types of evaluation metrics:
Scale dependent: The metrics which requires all the compared time series be on the same scale.
Eg:
Scale independent: The metrics used to compare forecast performance amongst different data sets
Eg:
- 12. Bringing Down Runtime, Scale and Cost
Runtime optimization: Run different experiments to narrow down poor
performing strategies or transformations
Cost optimization: Optimize the cost function by experimenting different server
options
- 14. Initial Approach
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
- 15. First experiment: Remove unnecessary metrics
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
- 17. First experiment: Remove unnecessary metrics
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
- 18. First experiment: Result
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
- 19. Second experiment: Forecast model selection
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
- 20. Second experiment: Forecast model selection
● Choose distinct but meaningful parameters for all strategies
● Initial experiment with default settings
○ Exception: reduced number of simulations for faster runtime
● Compare strategies across filtered evaluation metrics
- 22. Second experiment: Forecast model selection
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux
- 24. Third experiment: Auto ARIMA vs PyARIMA
Software
Hardware
TBATS ProphetPyARIMA
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA
- 25. Third experiment: Auto ARIMA vs PyARIMA
● Choose best Arima Implementation
○ Since Python’s Auto Arima reflect R’s implementation
● Compare error differences
○ Hold implementation with better performance
Error differences, where Python Auto ARIMA wins
50% py_auto_arima_log
50% py_auto_arima
- 26. Third experiment: Auto ARIMA vs PyARIMA
Software
Hardware
TBATS ProphetPyARIMA
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA
- 29. Fourth Experiment: Python Version Testing
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
- 30. Fourth Experiment: Python Version Testing
● Execute all strategies on both Python versions
○ Run over different server types
● Compare different processing times
- 32. Fourth Experiment: Python Version Testing
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
- 34. Fifth Experiment: Hardware Optimization
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
- 35. Fifth Experiment: Hardware Optimization
Goal: Run all strategies with different
transformation across different
servers. Pick best server based on
time(cumulative) consumption.
- 36. Fifth Experiment: Hardware Optimization
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
- 38. Determine bottlenecks:
● Determine if any strategy is consuming considerably more time compared to
others
● Try optimizing overhead created by rpy2 while spinning up R kernel
Sixth Experiment: How to improve processing time?
- 40. Let’s look at the time distribution of different strategies
Auto_ARIMA TBATS Prophet
Under 2nd
standard deviation
Across all samples
Result: Apply timeout duration of 300 secs and analyse the loss of forecasting
strength
Sixth Experiment: How to improve processing time?
- 41. ● Win distribution where Auto_ARIMA exceeds timeout threshold
○ time consuming accounts: 26/1000
○ time consuming accounts with Auto Arima as winning strategy: 14/26
Sixth Experiment: How to improve processing time?
- 42. ● How much forecasting strength did we lose
○ Is ARIMA winning with huge margins
Sixth Experiment: How to improve processing time?
- 43. How does the time threshold help us:
● To gain 10% speedup
● Maintain acceptable loss of forecasting strength
Sixth Experiment: How to improve processing time?
- 44. Rpy2 ineffectiveness: It is not feasible to timeout the process when using rpy2
Solution: Create individual R scripts for each R strategy and run them as a
sub-process in Python
Sixth Experiment: How to improve processing time?
- 49. ● Portable environments
● Immutable image
● Fast & Easy to deploy
across a cluster
● Same code in production
as in local development
- 50. Architecture
The Basics:
● AWS elastic infrastructure
● Redshift data warehouse
● Luigi task orchestration
● Celery queue to distribute work
● Docker on c4.8xlarge(s)
● Amazon Linux OS
● Anaconda distro
○ Python 3.6
○ R 3.4
● Apache Parquet on S3
ELERY
S3
ECS
Redshift
Parquet
EC2
- 52. Cluster Stats
● 28 Instances (c4.8xlarge)
● 1000 workers
● ~2.6 Million forecasts
● ~5.5 hours
On Demand Spot
Cost per hour $1.59 $0.60
Cost per run $267.00 $100.80
- 54. Future Improvements
● Run the experiment if any of the library experience upgrades
● Periodically look for new time series strategies
● Add regression models
● Investigate AWS Batch or Kubernetes.