2
$\begingroup$

I have a list of solar production data time-series for a set of houses in a given area, and I need to simulate the solar production vs time curve of a new house in the area. The sum of the total production should be similar, as they are in similar locations, and as such, the year-long factors such as season, irradiance, etc... will be equal. However, each time-series also has different attributes, such as peaks in different hours (depends on the orientation of house and panels), as well as other effects due to shadows at certain hours. So one may peak in the morning, while other one may peak in the afternoon. Due to this heterogenity, I couldn't just take a mean of each X minutes interval of all of them for example, as I'd likely end up with an unrealistic flattened production curve. I also don't know anything about the local factors of this new house (orientation, etc...), except for its location, so I couldn't use that information.

I thought of removing seasonality for each house (I do have enough data for each one of the houses to fit a different model), average over all of time-series with seasonality removed, and then simulate a set of time-series with different seasonality parameters applied to this averaged time-series. I could just then take a randomly selected curve for each day, and stitch up all the days to end up with the complete time-series. My questions are whether this is solid and how to select the seasonality parameters to create each curve?

Another option I thought of was just to take a randomly selected real production curve opposed to removing the seasonality to all of them and computing an average, but maybe this would increase the estimation error?

I am open to suggestions. I have a background in statistics though I've never really worked that much with time-series. Thank you!

$\endgroup$

1 Answer 1

2
$\begingroup$

You may be able to use a Hierarchical Generalized Additive Model (HGAM) to accomplish this. There is a very useful introductory tutorial on how to set these models up in R using the {mgcv} library. The idea is to fit a model that estimates a shared seasonal shape across all time series, but that also estimates nonlinear effects to allow each series' seasonal pattern to deviate from this shared seasonal shape. In doing so, you can perhaps acquire an estimate of what the population average seasonal shape may be. When predicting for a new series, you could then draw on this term. Of course if you know something about the possible features of the new house, you could take weighted draws from the other seasonal shapes to give a more properly informed, post-stratified prediction. But this will depend on what information you have available.

Below I demonstrate how such a model could be set up and interrogated using a few libraries that are designed for working with these models. In the simulation I use a function from my own package {mvgam} to ensure I end up with a set of time series that have varying seasonal patterns. But you could come up with your own simulation if you want to, that part isn't as crucial:

# Load libraries
library(mvgam)
library(mgcv)
library(gratia)
library(marginaleffects)
library(ggplot2); theme_set(theme_bw())

# Simulate some time series with sligthly different seasonalities
set.seed(999)
simdat <- sim_mvgam(T = 120, 
                    n_series = 6, 
                    mu = 2.5,
                    trend_model = 'GP',
                    use_lv = TRUE,
                    n_lv = 2,
                    seasonality = 'hierarchical',
                    prop_train = 1)$data_train

# Plot the series
ggplot(simdat, aes(x = time, y = y, col = series)) +
  geom_point() + 
  geom_line() +
  facet_wrap(~series) +
  theme(legend.position = 'none')

# Add a new factor level for the missing series we'd like to predict
levels(simdat$series)
#> [1] "series_1" "series_2" "series_3" "series_4" "series_5" "series_6"
levels(simdat$series) <- c(levels(simdat$series), 'series_7')

# Fit a hierarchical GAM using the bam() function for faster estimation
mod <- bam(y ~ 
             # hierarchical intercepts for each series; this is needed
             # so we can predict the intercept for the missing series
             s(series, bs = 're') +
             
             # shared cyclic seasonal term
             s(season, bs = 'cc', k = 12) +
             
             # deviation seasonal terms
             s(season, by = series, bs = 'cc', k = 12) +
             
             # shared long-term trend
             s(year, bs = 'tp', k = 10) +
             
             # deviation trends
             s(year, by = series, bs = 'tp', k = 10),
           
           # knot placement to ensure seasonal smooths join correctly
           # at the boundaries
           knots = list(season = c(0.5, 12.5)),
           
           # ensure all factor levels can be predicted
           drop.unused.levels = FALSE,
           family = poisson(),
           data = simdat,
           discrete = TRUE,
           nthreads = 2)

# Draw the estimated seasonal smooth terms
gratia::draw(mod, select = 'season', partial_match = TRUE)

# Each seasonal shape is a combination of the shared smooth and 
# the series' deviation smooth; plot seasonal shapes for each series
plot_predictions(mod, condition = c('season', 'series', 'series'),
                 
                 # include observations to see how the shapes have 
                 # smoothed through the data
                 points = 0.5)  +
  theme(legend.position = 'none')

# Plot all series' seasonal shapes again, and predict for the new series
plot_predictions(mod, by = c('season', 'series', 'series'),
                 newdata = datagrid(season = seq(1, 12, by = 0.25),
                                    series = unique(levels(simdat$series)))) +
  theme(legend.position = 'none')

Created on 2024-05-29 with reprex v2.0.2

Above you can see what our predictions would be for the new series (series_7), which are drawn from the series-level hierarchical intercepts and the shared seasonal term. Hope that helps a bit.

$\endgroup$
1
  • $\begingroup$ Thank you for the link to that paper. Very useful ! $\endgroup$
    – Lynchian
    Commented Jun 19 at 8:18

Not the answer you're looking for? Browse other questions tagged or ask your own question.