0
$\begingroup$

I am training some neural networks in pytorch to use as an embedded surrogate model. Since I am testing various architectures, I want to compare the accuracy of each one, but I am also interested in evaluating the computational time of a single forward pass as accurately as possible. Below is the structure I have been currently using, but I wonder if it can be done better:

import torch
from time import perf_counter

x = ... # input tensor (n_samples, n_input_features)
model = ... # trained pytorch model
 
times = [] # empty list to hold evaluation times

# Warm up pytorch:
_ = model(x)

# Timing run: 
for i in range(n_samples):
    start = perf_counter()
    with torch.no_grad():
        y_hat = model(x[i])
    end = perf_counter()
    times.append(end-start)

avg_time = sum(times)/n_samples # average time per run

The reason I evaluate each sample individually in a loop is that in the embedded surrogate model, the model will receive a single set of inputs at a time. This approach seems more applicable in my case, especially to avoid parallel computation with CUDA or MPS for the whole set of samples in x.

I have a few questions regarding this:

  1. Can the current structure of my code be improved to maximize the accuracy of the timings?

  2. If I have the device of the model and tensors set to MPS, is there a benefit to setting it to CPU when evaluating computation time?

  3. Wouldn't it make sense to confine the evaluation of the model to a specific thread in the CPU to maximize consistency in my readings? Is that even possible?

  4. Any other thoughts or suggestions you may have on this?

Thanks in advance for the help!

$\endgroup$

0