Ungraded Assignments (Bert and T5) Model Decoding

When running the ungraded assignments on Bert Loss and t5, why do they take a long time to decode if the models are already pre-trained and loaded? Isn’t the time-intensive part of BERT the training, which requires the GPU?

Hi @pneff

As you can see in the Instructions the environment does not have a GPU:

GPU accelerates not only training, but inference too - because there are a lot of matrix multiplications which are done faster on GPU.

And also the model is not very small… :slight_smile: - actually it is very big (24 encoder layers and 24 decoder layers).

And yes, training this model would take ages if done on CPU.

Thanks. I understand the needs for GPU in training vs CPU. But if someone is doing something like a chatbot (next week) or text prediction in a production environment, is the only way to speed things up to use more GPU?

Hi @pneff

In case the only thing you can change is hardware and your system is not optimal/balanced (the bottleneck is GPU) then I don’t want to be “captain obvious” here :slight_smile: but adding more GPUs or having faster GPUs would definetely help. But adding more GPUs does not always speed up your inference.

In case you can change other things (not only hardware) then of course there are a lot of other things that could speed up your inference, for example model optimization (model architecture, parameter pruning, etc.), result caching, etc.

There is an excelent free book online that has a chapter on computational performance (Chapter 13), in particular you might be interested in Hardware aspects to it, Multiple GPUs aspects or Parameter Servers