Mini Batch and CPU/GPU memory

Hi, in the lecture Prof Ng mentioned that as a guideline, we should make sure mini-batch fits in CPU/GPU memory. Can you explain what it means ?

Thank you.

1 Like

Your computer has a finite amount of memory, right? If it’s a relatively modern computer (even a laptop), you’ve typically got either 8GB, 16GB or maybe if you’re really lucky 32GB of physical memory to work with. Of course the operating system implements “virtual memory”, so that you can actually write a program that is too big to fit in the actual physical memory that you have available. It will run, but it may run incredibly slowly if the OS has to do “swapping”, which means saving part of your memory image to disk and then reading in a different part of it. The key point to realize is that there is a very very big difference between the speed of main memory and the speed of a disk. So if you want your code to run efficiently, you have to make sure that the “virtual size” of your program fits easily within the physical memory that is available to you. Of course that includes both the code itself and all the data you need to access. If your training dataset is very large, then the entire dataset may not fit within main memory. That’s a case in which “minibatch” can be a big win: just make sure that you select a minibatch size small enough that it’s not close to the limits of your physical memory. Of course there’s a lot of advice that minibatch sizes in general should be relatively small. I think the famous quote from Yann LeCun is “Friends don’t let friends use minibatch sizes > 32”. That may not be an exact quote, but that’s the gist of what he was saying.

1 Like

Great explanation. Thank you so much.

Where is virtual memory? Is it memory in the disk? Physical memory refers to RAM size right?

Physical memory is the RAM size. Virtual memory is a just a software concept which is the “virtual” model of memory that an application sees when it runs. The OS creates that by using a combination of physical RAM with memory address mapping hardware and then disk space for the overflow when the virtual memory required by your program no longer fits in physical RAM. Note that everything running on the computer (including the OS itself) competes for the physical RAM. But the serious point here is that you really really don’t want to be using the disk (also called “swap space”) as part of your virtual memory. The reason is that is typically more than a thousand times (literally) slower than RAM and caches. The response times of disks are measured in milliseconds. Of course if you have a solid state “disk” (Flash Memory) it may not be quite that slow. The response time of main memory and caches is in the range of tens of nanoseconds to microseconds. Once your program gets big enough that it overflows physical RAM, your performance will go off a cliff. We’re not just talking 2x or 4x slower here: as explained above, it can be hundreds or thousands of times slower. That is why Prof Ng is making this point about being sure your minibatch size fits in memory along with everything else you need (e.g. code) to run the training.

Thank you for the explanation its brilliant!