I have a list of solar production data time-series for a set of houses in a given area, and I need to simulate the solar production vs time curve of a new house in the area. The sum of the total production should be similar, as they are in similar locations, and as such, the year-long factors such as season, irradiance, etc... will be equal. However, each time-series also has different attributes, such as peaks in different hours (depends on the orientation of house and panels), as well as other effects due to shadows at certain hours. So one may peak in the morning, while other one may peak in the afternoon. Due to this heterogenity, I couldn't just take a mean of each X minutes interval of all of them for example, as I'd likely end up with an unrealistic flattened production curve. I also don't know anything about the local factors of this new house (orientation, etc...), except for its location, so I couldn't use that information.
I thought of removing seasonality for each house (I do have enough data for each one of the houses to fit a different model), average over all of time-series with seasonality removed, and then simulate a set of time-series with different seasonality parameters applied to this averaged time-series. I could just then take a randomly selected curve for each day, and stitch up all the days to end up with the complete time-series. My questions are whether this is solid and how to select the seasonality parameters to create each curve?
Another option I thought of was just to take a randomly selected real production curve opposed to removing the seasonality to all of them and computing an average, but maybe this would increase the estimation error?
I am open to suggestions. I have a background in statistics though I've never really worked that much with time-series. Thank you!