The performance of the model run on two epochs has a Loss of 0.48 and accuracy of 0.86. I thought Loss is 1- Accuracy. How is the loss and Accuracy both fairly large?
In the expected output section it talks about training model on CPU and then training on GPU. Is the code running off of my machine’s CPU or GPU or the Coursera servers in the cloud? Is there someway to control whether the model runs off of CPU or GPU (whether on my machine or on Coursra’s servers) by specifying in code or does the require much more sophisticated understanding of hardware. See below:
Based on the summary output the model we run is ~23M parameters? That seems quite large considering how expensive compute is and the fact we’re using either our computer CPU or GPU or Coursera’s cloud resources. I guess 23M parameter model does require a lot of compute to train then? Trying to develop an intuition for how expensive compute might be based on parameter count and data set size.
Loss is not equal to 1 - Accuracy. The loss (or cost) is computed from the mathematical formula that gets minimized during training. Accuracy is computed from a complete different formula, and is used to evaluate how well your model performs.
If you haven’t downloaded the assignment files offline, then the assignment is running off Coursera servers. I believe that for most assignments, it is run off CPUs rather than GPUs.
There are indeed some things to note to perform training on GPU, and it is important to know the specs of the GPU hardware. One of the main problems is whether or not the model will fit onto a GPU. In the code, you need to “put” both your model and input data onto the GPU, and the way to do this is different for different implementations.
Running inference on 23M parameters shouldn’t take up that much resources. The training can take up resources, so we’re usually just fine-tuning a pre-trained model for the assignments (fine-tuning requires significantly less resources).
Many of the LLMs have billions upon billions of parameters, and those may be harder to run locally on your machine, but 23M isn’t really that high and should be fine. Most modern computers can easily go through millions of computations per second.