Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R

Robust Automated Forecasting
In Python & R
Pranav Bahl, Data Scientist
Jonathan Stacks, DevOps Engineer

Time Series Forecasting
Time Series: A series of data points indexed in time order, spaced at equal time
intervals. It consists of two variables, time and values.

Overview
Goal: Demonstrate how to make millions of [robust] forecasts in an automated fashion
● Define the problem
● Retrieve and preprocess data
○ Profile
○ Transform
○ Detect outliers/anomalies
● Create forecast models employing multiple strategies and parameter combinations
○ Using Python
○ Using R
● Evaluate models with contextual evaluation metrics (and meta-metrics)
● Discuss how to choose the most appropriate hardware

Defining the problem
● Forecasts are used throughout
our risk management system
○ Bias towards risk aversion
● > 250,000 unique time series
○ Growing at 2x each year
● Elastic compute capacity
● Runtime <= 5 hours
● Cost <= $200/day
● Reduce error by 10%
○ Based on cumulative forecast % error

Runtime and cost calculation
Because the number of unique time series to be processed change over time so the idea is to adjust
number of servers to get the job done in desired time.

Time series profiling
● Ensure index is a timestamp
● Check for completeness of panel
● Remove leap days
● Make sure data has at least one complete season
● Truncate data to in favor of complete seasons

Handling missing values
● Impute using…
○ Descriptive statistics (e.g. mean or median)
○ Interpolation (e.g. linear or polynomial)
○ Extrapolation (e.g. training a linear model)

Outlier/anomaly detection
Outlier: An observation that lies an abnormal distance from other values in a random
sample from a population.
- Boxplot
- Time-series decomposition routine
Anomaly: Illegitimate data point that’s generated by a different process than whatever
generated the rest of the data.
- Forecasting
- Robust Principal Component Analysis (RPCA)

● Regression
○ [Python] StatsModels (OLS, polynomial)
○ [Python/R] XGBoost (gradient boosted regression trees)
● ARMA / ARIMA / SARMIA (Autoregressive Moving-Average)
○ [R] Forecast
○ [Python] StatsModels
○ [Python] Pyramid
● Exponential smoothing
○ [R] Forecast (Holt-Winters)
● Structural Models / State space models
○ [R] BSTS (Bayesian Structural Time Series)
○ [Python] Pyflux
○ [Python/R] Prophet
Forecasting library landscape

Evaluation Metrics
Evaluation metrics are used to measure the quality of forecast. It is also used to
compare different strategies.There are two types of evaluation metrics:
Scale dependent: The metrics which requires all the compared time series be on the same scale.
Eg:
Scale independent: The metrics used to compare forecast performance amongst different data sets
Eg:

Bringing Down Runtime, Scale and Cost
Runtime optimization: Run different experiments to narrow down poor
performing strategies or transformations
Cost optimization: Optimize the cost function by experimenting different server
options

Initial Approach
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA Pyflux

First experiment: Remove unnecessary metrics
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
MAE
MSE
NMSE
SMSE
SSE
Scale dependent
Scale independent
MPE
MAPE
SMAPE
MASE
Metrics
Default Log

First experiment: Remove unnecessary metrics

First experiment: Result
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log

Second experiment: Forecast model selection
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log

● Choose distinct but meaningful parameters for all strategies
● Initial experiment with default settings
○ Exception: reduced number of simulations for faster runtime
● Compare strategies across filtered evaluation metrics

Second experiment: Result
Software
Hardware
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
TBATS ProphetPyARIMAAuto ARIMA

Third experiment: Auto ARIMA vs PyARIMA
Software
Hardware
TBATS ProphetPyARIMA
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA

Third experiment: Auto ARIMA vs PyARIMA
● Choose best Arima Implementation
○ Since Python’s Auto Arima reflect R’s implementation
● Compare error differences
○ Hold implementation with better performance
Error differences, where Python Auto ARIMA wins
50% py_auto_arima_log
50% py_auto_arima

Third experiment: Result
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Default Log
Auto ARIMA

Added another transformation
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox

Fourth Experiment: Python Version Testing
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 2 Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox

● Execute all strategies on both Python versions
○ Run over different server types
● Compare different processing times

Fourth Experiment: Result
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox

Fifth Experiment: Hardware Optimization
Software
Hardware
TBATS Prophet
C4 R4 X1
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox

Fifth Experiment: Hardware Optimization
Goal: Run all strategies with different
transformation across different
servers. Pick best server based on
time(cumulative) consumption.

Fifth Experiment: Result
Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox

Determine bottlenecks:
● Determine if any strategy is consuming considerably more time compared to
others
● Try optimizing overhead created by rpy2 while spinning up R kernel
Sixth Experiment: How to improve processing time?

Let’s look at the time distribution of different strategies
Auto_ARIMA TBATS Prophet
Under 2nd
standard deviation
Across all samples
Result: Apply timeout duration of 300 secs and analyse the loss of forecasting
strength

● Win distribution where Auto_ARIMA exceeds timeout threshold
○ time consuming accounts: 26/1000
○ time consuming accounts with Auto Arima as winning strategy: 14/26

● How much forecasting strength did we lose
○ Is ARIMA winning with huge margins

How does the time threshold help us:
● To gain 10% speedup
● Maintain acceptable loss of forecasting strength

Rpy2 ineffectiveness: It is not feasible to timeout the process when using rpy2
Solution: Create individual R scripts for each R strategy and run them as a
sub-process in Python

Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
BSTS
Add BSTS(Bayesian Structural Time Series) strategy

Add BSTS(Bayesian Structural Time Series) strategy

Concluding Pipeline
Software
Hardware
TBATS Prophet
C4
Amazon EC2
Python 3
Transformations
Strategies
Languages
MFE
RMSE
Scale dependent
Scale independent
SMAPE
MASE
Metrics
Auto ARIMA
Default Log BoxCox
BSTS

● Portable environments
● Immutable image
● Fast & Easy to deploy
across a cluster
● Same code in production
as in local development

Architecture
The Basics:
● AWS elastic infrastructure
● Redshift data warehouse
● Luigi task orchestration
● Celery queue to distribute work
● Docker on c4.8xlarge(s)
● Amazon Linux OS
● Anaconda distro
○ Python 3.6
○ R 3.4
● Apache Parquet on S3
ELERY
S3
ECS
Redshift
Parquet
EC2

Cluster Stats
● 28 Instances (c4.8xlarge)
● 1000 workers
● ~2.6 Million forecasts
● ~5.5 hours
On Demand Spot
Cost per hour $1.59 $0.60
Cost per run $267.00 $100.80

Conclusion
New forecasting pipeline resulted in reduction of forecast error

Future Improvements
● Run the experiment if any of the library experience upgrades
● Periodically look for new time series strategies
● Add regression models
● Investigate AWS Batch or Kubernetes.

References
https://c2fo.com/
https://www.otexts.org/fpp
https://arxiv.org/pdf/1302.6613.pdf
https://sites.google.com/site/stevethebayesian/googlepageforstevenlscott/course-a
nd-seminar-materials/bsts-bayesian-structural-time-series

Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R

Related slideshows

More Related Content

Pranav Bahl & Jonathan Stacks - Robust Automated Forecasting in Python and R