Using NN’s for Long-Range Forecasts: Electricity Data

In this example we’ll show how to handle complex series. The domain here is electricity-usage data (using a dataset from the UCI Machine Learning Data Repository), and it’s a great example of a challenge for traditional forecasting applications. In these traditional approaches, we divide our model into siloed processes that each contribute to separate behaviors of the time-series. For example:

  • Hour-in-day effects

  • Day-in-week effects

  • Season-in-year effects

  • Weather effects

However, with electricity data, it’s limiting to model these separately, because these effects all interact: the impact of hour-in-day depends on the day-of-week, the impact of the day-of-week depends on the season of the year, etc.

[2]:
BASE_DIR = 'electricity'
SPLIT_DT = np.datetime64('2013-06-01')
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Data-Prep

Our dataset consists of hourly kW readings for multiple buildings:

[4]:
df_elec.head()
[4]:
group time kW dataset
0 MT_001 2012-01-01 00:00:00 3.172589 train
1 MT_001 2012-01-01 01:00:00 4.124365 train
2 MT_001 2012-01-01 02:00:00 4.758883 train
3 MT_001 2012-01-01 03:00:00 4.441624 train
4 MT_001 2012-01-01 04:00:00 4.758883 train

Let’s pick an example group to focus on, for demonstrative purposes:

[5]:
example_group = 'MT_358'
[6]:
df_elec.query(f"group=='{example_group}'").plot('time', 'kW', figsize=(20, 5))
[6]:
<Axes: xlabel='time'>
../_images/examples_electricity_9_1.png
[7]:
# we'll apply a sqrt transformation and center before training, and inverse these for plotting/eval:
group_means = (df_elec
               .assign(kW_sqrt = lambda df: np.sqrt(df['kW']))
               .query("dataset=='train'")
               .groupby('group')
               ['kW_sqrt'].mean()
               .to_dict())

def add_transformed(df: pd.DataFrame) -> pd.DataFrame:
    return df.assign(kW_sqrt_c = lambda df: np.sqrt(df['kW']) - df['group'].map(group_means))

def add_inv_transformed(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    cols = [c for c in ['lower', 'upper', 'mean', 'actual'] if c in df.columns]
    df[cols] += df['group'].map(group_means).to_numpy()[:, None]
    df[cols] = df[cols].clip(lower=0) ** 2
    if 'measure' in df.columns:
        df['measure'] = df['measure'].str.replace('_sqrt_c', '')
    return df
[8]:
df_elec = add_transformed(df_elec)

A Standard Forecasting Approach

First, let’s try a standard exponential-smoothing algorithm on one of the series.

[9]:
from torchcast.process import LocalTrend, Season

es = ExpSmoother(
    measures=['kW_sqrt_c'],
    processes=[
        # seasonal processes:
        Season(id='day_in_week', period=24 * 7, dt_unit='h', K=3, fixed=True),
        Season(id='day_in_year', period=24 * 365.25, dt_unit='h', K=8, fixed=True),
        Season(id='hour_in_day', period=24, dt_unit='h', K=8, fixed=True),
        # long-running trend:
        LocalTrend(id='trend'),
    ]
)
[10]:
ds_example_building = TimeSeriesDataset.from_dataframe(
    df_elec.query("group == @example_group"),
    group_colname='group',
    time_colname='time',
    dt_unit='h',
    measure_colnames=['kW_sqrt_c'],
)
ds_example_train, _ = ds_example_building.train_val_split(dt=SPLIT_DT)
ds_example_train
[10]:
TimeSeriesDataset(sizes=[torch.Size([1, 12408, 1])], measures=(('kW_sqrt_c',),))
[11]:
es.fit(
    ds_example_train.tensors[0],
    start_offsets=ds_example_train.start_datetimes,
)
Initializing trend.position to -3.935213044314878e-08
[11]:
ExpSmoother(processes=[Season(id='day_in_week'), Season(id='day_in_year'), Season(id='hour_in_day'), LocalTrend(id='trend')], measures=['kW_sqrt_c'])

Visualizing the results is… not that illuminating:

[16]:
es_predictions = es(
    ds_example_train.tensors[0],
    start_offsets=ds_example_train.start_datetimes,
    # note we *not* getting `num_timesteps` from `ds_example_train`, since we want to forecast outside the training time:
    out_timesteps=ds_example_building.num_timesteps
)
es_predictions.plot(es_predictions.to_dataframe(ds_example_train), split_dt=SPLIT_DT)
/Users/jacobdink/miniconda3/envs/windcreek-labs/lib/python3.9/site-packages/plotnine/geoms/geom_path.py:100: PlotnineWarning: geom_path: Removed 13896 rows containing missing values.
../_images/examples_electricity_18_1.png

With hourly data, it’s just really hard to see the data when we are this zoomed out!

We can try zooming in:

[17]:
es_predictions.plot(
    es_predictions
    .to_dataframe(ds_example_building).query("time.between('2013-05-15', '2013-06-15')")
    .pipe(add_inv_transformed),
    split_dt=SPLIT_DT
)
../_images/examples_electricity_20_0.png

This is better for actually seeing the data, but ideally we’d still like to get a view of the long range.

Let’s instead try splitting it into weekdays vs. weekends and daytimes vs. nightimes:

[18]:
def plot_2x2(df: pd.DataFrame,
             pred_colname: str = 'mean',
             actual_colname: str = 'actual',
             group_colname: str = 'group',
             time_colname: str = 'time',
             **kwargs):
    """
    Plot predicted vs. actual for a single group, splitting into 2x2 facets of weekday/end * day/night.
    """
    df_plot = df.assign(
        time_of_day=None,
        weekend=lambda df: np.where(df[time_colname].dt.weekday.isin([5, 6]), 'weekend', 'weekday'),
        date=lambda df: df[time_colname].dt.normalize()
    )
    df_plot.loc[df_plot[time_colname].dt.hour.isin([14, 15, 16, 17, 18]), 'time_of_day'] = 'day'
    df_plot.loc[df_plot[time_colname].dt.hour.isin([2, 3, 4, 5, 6]), 'time_of_day'] = 'night'
    df_plot[time_colname] = df_plot.pop('date')
    _agg = [pred_colname, actual_colname]
    if 'upper' in df_plot.columns:
        _agg.extend(['upper', 'lower'])
    df_plot = df_plot.groupby([group_colname, time_colname, 'weekend', 'time_of_day'])[_agg].mean().reset_index()
    df_plot['measure'] = df_plot['weekend'] + ' ' + df_plot['time_of_day']
    df_plot['mean'] = df_plot.pop(pred_colname)
    df_plot['actual'] = df_plot.pop(actual_colname)
    if 'upper' not in df_plot.columns:
        df_plot['lower'] = df_plot['upper'] = float('nan')
    return Predictions.plot(df_plot, time_colname=time_colname, group_colname=group_colname, **kwargs) + facet_wrap('measure')
[19]:
plot_2x2(es_predictions
         .to_dataframe(ds_example_building)
         .pipe(add_inv_transformed)
         ,split_dt=SPLIT_DT)
../_images/examples_electricity_23_0.png

Viewing the forecasts in this way helps us see a serious issue: the annual seasonal pattern is very different for daytimes and nighttimes, but the model isn’t capturing that at all. (In fact, even the lower amplitude in annual seasonality for nighttime forecasts is just an artifact of our sqrt-transform, not a product of the model.)

The limitation is inherent to the model: it only allows for a single seasonal pattern, rather than a separate one for different times of the day and days of the week.

Incorporating a Neural Network

We saw our time-series model wasn’t allowing for interactions among the components of our time-series model. A natural solution to this is to incorporate a neural network – learning arbitrary/complex interactions is exactly what they are for.

Of course, this requires scaling up our dataset: we want to learn across multiple series, so that our network can build representations of patterns that are shared across multiple buildings.

Encoding Seasonlity as Fourier Terms

We’re going to need to encode the seasonal cycles in our data into features that a neural-network can use as inputs.

There are multiple ways we can do this. For example, one option would be to extract each component of our datetimes and one-hot encode it:

[20]:
(df_elec
 .loc[df_elec['group']==example_group,['time']]
 .assign(hod=lambda df: df['time'].dt.hour, dow=lambda df: df['time'].dt.dayofweek)
 .pipe(pd.get_dummies, columns=['hod', 'dow'], dtype='float')
 .head())
[20]:
time hod_0 hod_1 hod_2 hod_3 hod_4 hod_5 hod_6 hod_7 hod_8 ... hod_21 hod_22 hod_23 dow_0 dow_1 dow_2 dow_3 dow_4 dow_5 dow_6
9232450 2012-01-01 00:00:00 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
9232451 2012-01-01 01:00:00 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
9232452 2012-01-01 02:00:00 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
9232453 2012-01-01 03:00:00 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
9232454 2012-01-01 04:00:00 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

5 rows × 32 columns

Instead, we’ll use a different approach based on fourier series. Basically, we encoded our times into sine/cosine waves. The number of waves is a hyper-parameter that can be tuned and helps us control how ‘wiggly’ we’ll allow the seasons to be. For more reading on using fourier series for modeling seasonality, see here and here.

For visualization (and shortly, modeling), we’ll use torchcast’s add_season_features function:

[21]:
from torchcast.utils import add_season_features

df_example = (df_elec[df_elec['group'] == example_group]
              .reset_index(drop=True)
              .pipe(add_season_features, K=8, period='yearly')
              .pipe(add_season_features, K=3, period='weekly')
              .pipe(add_season_features, K=8, period='daily'))

yearly_season_feats = df_example.columns[df_example.columns.str.startswith('yearly_')].tolist()
weekly_season_feats = df_example.columns[df_example.columns.str.startswith('weekly_')].tolist()
daily_season_feats = df_example.columns[df_example.columns.str.startswith('daily_')].tolist()
season_feats = yearly_season_feats + weekly_season_feats + daily_season_feats

Let’s visualize these waves:

[22]:
(df_example[yearly_season_feats + ['time']]
 .query("time.dt.year == 2013")
 .plot(x='time', figsize=(8,4), legend=False, title='Yearly Fourier Terms'))

(df_example[weekly_season_feats + ['time']]
 .query("(time.dt.year == 2013) & (time.dt.month == 6) & (time.dt.day < 14)")
 .plot(x='time', figsize=(8,4), legend=False, title='Weekly Fourier Terms'))

(df_example[daily_season_feats + ['time']]
 .query("(time.dt.year == 2013) & (time.dt.month == 6) & (time.dt.day == 1)")
 .plot(x='time', figsize=(8,4), legend=False, title='Daily Fourier Terms'))
[22]:
<Axes: title={'center': 'Daily Fourier Terms'}, xlabel='time'>
../_images/examples_electricity_31_1.png
../_images/examples_electricity_31_2.png
../_images/examples_electricity_31_3.png

Step 1: Pre-Training a Seasonal-Embeddings NN

For this task, we’ll switch from exponential smoothing (with ExpSmoother) to a full state-space model (with KalmanFilter). This is because only the latter can learn relationships specific to each time-series.

The LinearModel provides a catch-all for any way of passing arbitrary inputs to our model. For example, if we had weather data – temperature, humidity, wind-speed – we might add a process like:

LinearModel(id='weather', predictors=['temp','rhumidity','windspeed'])

And our time-series model would learn how each of these impacts our series(es).

Here we are using LinearModel a little differently: rather than it taking as input predictors, it will take as input the output of a neural-network, which itself will take predictors (the calendar-features we just defined).

What we’d like to do is train a neural network to embed the seasonality (as represented by our fourier terms) into a lower dimensional space, then we’ll pass that lower dimensional representation onto our KalmanFilter/LinearModel, which will learn how each time-series behaves in relation to that space. Basically, we are reducing the dozens of calendar-features (and their hundreds of interactions) into an efficient low-dimensional representation.

For this purpose, torchcast provides the SeasonalEmbeddingsTrainer class:

[23]:
from torchcast.utils.training import SeasonalEmbeddingsTrainer

SEASON_EMBED_NDIM = 20

season_embedder = torch.nn.Sequential(
    torch.nn.LazyLinear(out_features=48),
    torch.nn.Tanh(),
    torch.nn.Linear(48, 48),
    torch.nn.Tanh(),
    torch.nn.Linear(48, SEASON_EMBED_NDIM)
)

season_trainer = SeasonalEmbeddingsTrainer(
    module=season_embedder,
    yearly=8,
    weekly=3,
    daily=8
)

Again, the number of embedding dimensions is a hyper-parameter, where we trade off computational efficiency for precision.

Since we’ll be training on the whole dataset instead of just one example building, we’ll switch from a TimeSeriesDataset to a TimeSeriesDataLoader, which lets us iterate over the whole dataset in a training loop:

[24]:
season_dl = TimeSeriesDataLoader.from_dataframe(
    df_elec.query("dataset=='train'"),
    group_colname='group',
    time_colname='time',
    dt_unit='h',
    measure_colnames=['kW_sqrt_c'],
    batch_size=45 # fairly even batch sizes
)

Let’s use our trainer to train season_embedder:

[25]:
try:
    _path = os.path.join(BASE_DIR, f"season_trainer{SEASON_EMBED_NDIM}.pt")
    with open(_path, 'rb') as f:
        season_trainer = pickle.load(f).to(DEVICE)
    season_embedder = season_trainer.module
    plt.plot(season_trainer.loss_history)
    plt.ylim(None, max(season_trainer.loss_history[5:]))
    plt.ylabel('MSE')
    plt.show()

except FileNotFoundError as e:
    from IPython.display import clear_output

    season_trainer.loss_history = []
    for loss in season_trainer(season_dl):
        season_trainer.loss_history.append(loss)

        # plot:
        if len(season_trainer.loss_history) > 5:
            clear_output(wait=True)
            plt.plot(season_trainer.loss_history)
            plt.ylim(None, max(season_trainer.loss_history[5:]))
            plt.ylabel('MSE')
            plt.show()

        if len(season_trainer.loss_history) > 500:
            break

    with open(_path, 'wb') as f:
        pickle.dump(season_trainer, f)

season_trainer.to(torch.device('cpu'))
../_images/examples_electricity_39_0.png
[25]:
<torchcast.utils.training.SeasonalEmbeddingsTrainer at 0x17e1dd8e0>

Let’s visualize the output of this neural network, with each color being a separate output:

[26]:
with torch.no_grad():
    times = ds_example_building.times().squeeze()
    pred = season_trainer.module(season_trainer.times_to_model_mat(times).to(torch.float))
    _df_pred = pd.DataFrame(pred.numpy()).assign(time=times)

    (_df_pred
         .query("(time.dt.year == 2013) & (time.dt.month == 6) & (time.dt.day < 7)")
         .plot(x='time', figsize=(10,5), legend=False, title=f'Season NN (dim={SEASON_EMBED_NDIM}) Output', alpha=0.5))

    (_df_pred
         .query("(time.dt.year == 2013) & (time.dt.month < 6)")
         .plot(x='time', figsize=(10,5), legend=False, title=f'Season NN (dim={SEASON_EMBED_NDIM}) Output', alpha=0.25))
../_images/examples_electricity_41_0.png
../_images/examples_electricity_41_1.png

You can think of each colored line as equivalent to the sin/cos waves above; but now, instead of dozens of these, we have far fewer, that should hopefully be able to capture interacting seasonal effects.

Let’s verify this. The season-trainer has a predict method that lets us visualize how these embeddings can be used to predict the actual series values:

[27]:
with torch.no_grad():
    display(ds_example_building
             .to_dataframe()
             .assign(pred=season_trainer.predict(ds_example_building).squeeze())
             .pipe(plot_2x2, actual_colname='kW_sqrt_c', pred_colname='pred', split_dt=SPLIT_DT))
../_images/examples_electricity_43_0.png

Success! We now have different seasonal patterns depending on the time of the day.

Step 2: Incorporate our Seasonal-Embeddings into our Time-Series Model

How should we incorporate our season_embedder neural-network into a state-space model? First, we create our time-series model:

[28]:
from torchcast.kalman_filter import KalmanFilter
from torchcast.process import LinearModel, LocalLevel

kf_nn = KalmanFilter(
    measures=['kW_sqrt_c'],
    processes=[
        LinearModel(id='nn_output', predictors=[f'nn{i}' for i in range(SEASON_EMBED_NDIM)]),
        LocalLevel(id='level'),
    ],
).to(DEVICE)

Then, we have two options:

  1. The first option is to create our fourier-features on the dataframe, and pass these as features into a dataloader.

[31]:
dataloader_kf_nn = TimeSeriesDataLoader.from_dataframe(
    df_elec
        .query("dataset=='train'")
        .pipe(add_season_features, K=8, period='yearly')
        .pipe(add_season_features, K=3, period='weekly')
        .pipe(add_season_features, K=8, period='daily')
    ,
    group_colname='group',
    time_colname='time',
    dt_unit='h',
    y_colnames=['kW_sqrt_c'],
    X_colnames=season_feats,
    batch_size=50
)

…then we’d train our model with a tool like Pytorch Lightning. Torchcast also includes a simple tool for this, the StateSpaceTrainer:

[32]:
from torchcast.utils.training import StateSpaceTrainer

ss_trainer = StateSpaceTrainer(
    module=kf_nn,
    dataset_to_kwargs=lambda batch: {'X' : season_trainer.module(batch.tensors[1])},
)

## commented out since we're going with option 2 below
# for loss in ss_trainer(dataloader_kf_nn):
#     print(loss)
#     # etc...
  1. An even simpler (though less general) option is just to leverage the util methods in the SeasonalEmbeddingsTrainer, which handles converting a TimeSeriesDataset into a tensor of fourier terms:

[33]:
def dataset_to_kwargs(batch: TimeSeriesDataset) -> dict:
    seasonX = season_trainer.times_to_model_mat(batch.times()).to(dtype=torch.float, device=DEVICE)
    return {'X' : season_trainer.module(seasonX)}

ss_trainer = StateSpaceTrainer(
    module=kf_nn,
    dataset_to_kwargs=dataset_to_kwargs,
    optimizer=torch.optim.Adam(kf_nn.parameters(), lr=.05)
)

Then we don’t need to use add_season_features when creating our data-loader, since season_trainer.times_to_model_mat will create them per-batch as needed (which will be much easier on our GPU’s memory):

[34]:
dataloader_kf_nn = TimeSeriesDataLoader.from_dataframe(
    df_elec.query("dataset=='train'"),
    group_colname='group',
    time_colname='time',
    dt_unit='h',
    measure_colnames=['kW_sqrt_c'],
    batch_size=40
)

Training End-to-End

Above, we never actually registered season_trainer.module as an attribute of our KalmanFilter (i.e. we didn’t do kf_nn.season_nn = season_trainer.module). This means that we won’t continue training the embeddings as we train our KalmanFilter. Why not? For that matter, why did we pre-train in the first place? Couldn’t we have just registered an untrained embeddings network and trained the whole thing end to end?

In practice, neural-networks have many more parameters and take many more epochs than our state-space models (and conversely, our state-space models are much slower to train per epoch). So it’s much more efficient to pre-train the network first. Then it’s up to us whether we want to continue training the network, or just freeze its weights (i.e. exclude it from the optimizer) and just train the state-space models’ parameters. Here we’re freezing them by not assigning the network as an attribute (so that the parameters don’t get passed to when we run torch.optim.Adam(kf_nn.parameters(), lr=.05)).

[35]:
try:
    _path = os.path.join(BASE_DIR, f"ss_trainer{SEASON_EMBED_NDIM}.pt")
    with open(_path, 'rb') as f:
        ss_trainer = pickle.load(f).to(DEVICE)
    kf_nn = ss_trainer.module
    plt.plot(ss_trainer.loss_history)
    plt.show()
except FileNotFoundError as e:
    torch.cuda.empty_cache()
    ss_trainer.loss_history = []
    for loss in ss_trainer(dataloader_kf_nn, forward_kwargs={'n_step' : 14*7*24, 'every_step' : False}):
        ss_trainer.loss_history.append(loss)
        print(loss)

        with open(_path, 'wb') as f:
            pickle.dump(ss_trainer, f)

        if len(ss_trainer.loss_history) > 30:
            break
../_images/examples_electricity_57_0.png

Now we’ll create forecasts for all the groups, and back-transform them, for plotting and evaluation.

[36]:
with torch.inference_mode():
    dataloader_all = TimeSeriesDataLoader.from_dataframe(
        # importantly, we'll set to `nan` our target when it's outside the training dates
        # this allows us to use `season_feats` in the forecast period, while not having data-leakage
        df_elec.assign(kW_sqrt_c=lambda df: df['kW_sqrt_c'].where(df['dataset']=='train')),
        group_colname='group',
        time_colname='time',
        dt_unit='h',
        measure_colnames=['kW_sqrt_c'],
        batch_size=50
    )

    df_all_preds = []
    for batch in tqdm(dataloader_all):
        batch = batch.to(DEVICE)
        seasonX = season_trainer.times_to_model_mat(batch.times()).to(dtype=torch.float, device=DEVICE)
        pred = kf_nn(batch.tensors[0], X=season_trainer.module(seasonX), start_offsets=batch.start_offsets)
        df_all_preds.append(pred.to_dataframe(batch).drop(columns=['actual']))
df_all_preds = pd.concat(df_all_preds).reset_index(drop=True)
# back-transform:
df_all_preds = (df_all_preds
                 .merge(df_elec[['group','time','kW','dataset']])
                 .pipe(add_inv_transformed))
df_all_preds
[36]:
group time measure mean lower upper kW dataset
0 MT_001 2012-01-01 00:00:00 kW 1.082254 0.000000 133.029539 3.172589 train
1 MT_001 2012-01-01 01:00:00 kW 0.310771 0.000000 36.109307 4.124365 train
2 MT_001 2012-01-01 02:00:00 kW 1.107223 0.000000 33.909440 4.758883 train
3 MT_001 2012-01-01 03:00:00 kW 2.929986 0.000000 38.848475 4.441624 train
4 MT_001 2012-01-01 04:00:00 kW 3.193181 0.000000 37.979428 4.758883 train
... ... ... ... ... ... ... ... ...
9548093 MT_369 2014-12-31 19:00:00 kW 837.039001 476.854617 1297.890767 692.631965 val
9548094 MT_369 2014-12-31 20:00:00 kW 827.793849 469.884445 1286.369644 688.416422 val
9548095 MT_369 2014-12-31 21:00:00 kW 825.011579 467.799304 1282.883324 662.023460 val
9548096 MT_369 2014-12-31 22:00:00 kW 836.415516 476.405873 1297.078379 679.252199 val
9548097 MT_369 2014-12-31 23:00:00 kW 834.971269 475.314866 1295.281680 659.274194 val

9548098 rows × 8 columns

[37]:
plot_2x2(df_all_preds.query("group==@example_group"), actual_colname='kW', split_dt=SPLIT_DT)
../_images/examples_electricity_60_0.png

Success! If our example group is representative, our forecasting model was able to use the embeddings to capture complex seasonal structure.

Evaluation

A Simple Baseline

We’ve see that, for this dataset, generating forecasts that are sane is already an achievement.

But of course, ideally we’d actually have some kind of a quantitative measure of how good our forecasts are.

For this, it’s helpful to compare to a baseline. torchcast provides a simple baseline which just forecasts based on N-timesteps ago, with options for first smoothing the historical data. In this case, we forecast based on the same timepoint 365 days ago, after first applying a rolling-average with a 4-hour window.

[38]:
from torchcast.utils import make_baseline

df_baseline365 = make_baseline(df_elec,
                                group_colnames=['group'],
                                time_colname='time',
                                value_colname='kW',
                                is_train=lambda df: df['dataset']=='train',
                                lag=int(365*24),
                                smooth=4)
plot_2x2(
    df_baseline365.query("group==@example_group").merge(df_elec),
    pred_colname='baseline',
    actual_colname='kW',
    split_dt=SPLIT_DT
)
../_images/examples_electricity_64_0.png

We can see this baseline provides sensible enough forecasts on our example building.

Comparison of Performance

Now that we have something to compare our torchcast model to, we can evaluate its performance.

[39]:
df_compare = (df_all_preds[['group', 'mean', 'time', 'kW', 'dataset']]
             .rename(columns={'mean' : 'torchcast'})
             .merge(df_baseline365, how='left'))

df_compare_long = df_compare.melt(
    id_vars=['group', 'time', 'kW', 'dataset'],
    value_vars=['torchcast', 'baseline'],
    var_name='model',
    value_name='forecast'
)
df_compare_long['error'] = np.abs(df_compare_long['forecast'] - df_compare_long['kW'])
df_compare_long
[39]:
group time kW dataset model forecast error
0 MT_001 2012-01-01 00:00:00 3.172589 train torchcast 1.082254 2.090335
1 MT_001 2012-01-01 01:00:00 4.124365 train torchcast 0.310771 3.813595
2 MT_001 2012-01-01 02:00:00 4.758883 train torchcast 1.107223 3.651660
3 MT_001 2012-01-01 03:00:00 4.441624 train torchcast 2.929986 1.511639
4 MT_001 2012-01-01 04:00:00 4.758883 train torchcast 3.193181 1.565702
... ... ... ... ... ... ... ...
19096191 MT_369 2014-12-31 19:00:00 692.631965 val baseline 712.793255 20.161290
19096192 MT_369 2014-12-31 20:00:00 688.416422 val baseline 702.025293 13.608871
19096193 MT_369 2014-12-31 21:00:00 662.023460 val baseline 708.486070 46.462610
19096194 MT_369 2014-12-31 22:00:00 679.252199 val baseline 728.464076 49.211877
19096195 MT_369 2014-12-31 23:00:00 659.274194 val baseline 751.282991 92.008798

19096196 rows × 7 columns

Over Time

First let’s compare abs(error) over time, across all groups:

[40]:
df_compare_per_date = (df_compare_long
                       .query("dataset!='train'")
                       .assign(date=lambda df: df['time'].dt.normalize())
                       .groupby(['date', 'model'])
                       ['error'].mean()
                       .reset_index())
(df_compare_per_date
 .pivot(index='date', columns='model', values='error')
 .plot(ylim=(0, 300), ylabel="Abs(Error)", title="Torchcast vs. Baseline: Error over Time"))
[40]:
<Axes: title={'center': 'Torchcast vs. Baseline: Error over Time'}, xlabel='date', ylabel='Abs(Error)'>
../_images/examples_electricity_69_1.png

We can see that the torchcast model beats the baseline almost ever day. The clear exception is holidays (e.g. Thanksgiving and Christmas), which makes sense: we didn’t add any features for these to our torchcast model, while the baseline model essentially gets these for free. Future attempts could resolve this straightforwardly by adding dummy-features to our LinearModel, or by increasing the K (aka “wiggliness”) parameter (see “Encoding Seasonlity as Fourier Terms” above).

By Group

Now let’s compare over the whole test period, focusing on percent error (relative to that group’s average) so that we can weight small and large groups equally.

[41]:
from matplotlib.ticker import PercentFormatter

df_compare_per_group = (df_compare_long
                         .groupby(['group', 'dataset', 'model'])
                         [['error', 'kW']].mean()
                         .reset_index()
                         .assign(prop_error=lambda df: df['error'] / df['kW']))

compare_per_group_agg = (df_compare_per_group
                         .groupby(['dataset', 'model'])
                         .agg(prop_error=('prop_error', 'mean'), sem=('prop_error', 'sem')))

(compare_per_group_agg.loc['test']['prop_error']
 .plot(kind='bar',
       yerr=compare_per_group_agg.loc['test']['sem'],
       ylabel='Percent Error',
       title="Torchcast vs. Baseline: % Error in Test Set")
 .yaxis.set_major_formatter(PercentFormatter(xmax=1)))
../_images/examples_electricity_72_0.png

We see that the torchcast forecasts have significantly lower % error on average.