Free GPU for training

hello amazing people,
just wanted to ask if someone could guide me on how can i have the free GPU for training. can’t use kaggle ( editing .py files is very tough ). can’t use colab ( free sessions terminate ). any other option ? one in my mind was to have the google cloud fre e300$ credits. But, they didn’t give me GPU quota for that. any way-out ? I’m final year student.

Heavy GPU use is not free.

If you need a relatively minimal amount of resources, Colab or Kaggle are the usual choices.

1 Like

Processing power costs and costs a lot, and somebody ultimately has to pay for it. Even the small amount you have freely don’t think there aren’t any costs in it.

yea. that’s true and sad

Someone has to fund all of the energy it takes to run a GPU farm.

1 Like

for real.
maybe in future, with advancement of tech, we’ll see drastic drop in gpu costs.

I’ve read that Colab has a dynamic time setting for using a GPU, how much does your session last with a GPU and a TPU?

Can you explain why the Google Cloud free tier, or any other provider, does not suit your needs? In my case, I got access to an Azure for Students account through the GitHub Student Developer Pack. The difference with their free tier is that they give you $100 for a whole year (instead of $200 for 30 days) and it’s renewable.

How much compute capacity and time do you think you’ll need?

Regarding Google, do you know about the TPU Research Cloud? It’s a programme for students and researchers with limited financial resources. Take a look at this post:

thank you so much @vgrz for writing detailed answer. I actually have to train a 5M params model that’s to be trained for around 10 days. Yeah, i had an idea about MS Azure. at the same time, i got to know about google cloud, so I went towards that but found that they don’t give GPU.

so, how was your experience with MS Azure ? and, would I have full control like code editing etc. ?

That’s certainly going to require some sort of paid service.

I created the account to prepare for the AZ-900 Azure Fundamentals exam and passed it in November. So far, I’ve been exploring their services and have only used Azure Machine Learning superficially. It’s a notebook-oriented environment. All instances with a GPU are paid for, like in other providers, so you’ll have to use the credits they give you.

10 days with what GPU? On Azure, an NC64as T4 v3 instance with 4x Nvidia Tesla T4 (16x4 = 64 GB) costs $4.352/h and an ND96amsr A100 v4 instance with an Nvidia A100 (80 GB) costs $32.77/h. I don’t see that you can choose spot instances for these.

Keep in mind that in AWS they don’t give you credits but you can request them ($300).

I’d like to know if the following idea is feasible:

  • Partition the dataset, so you can train one part in one provider and the rest in another.
  • Create checkpoints to save model states.
  • Merge them together.

Edit. I’ve just discovered that in my Azure for Students account I cannot request a quota for an instance with a GPU in the West Europe region. :sob:

Actually, I need around 1 T4 for 10 days.
And, sadly, I’m from a developing country, Pakistan. So, when I have to pay in dollars, the amount is too much in Rs. for me.
So, I was about to spend time today in setting up azure. don’t they give the GPU in $100 credits ?

I understand. Have you thought about the idea I proposed? Based on what I’ve read, it can be thought as a federated learning approach were you use the credits of 2 providers to train the model.

Azure for Students is a managed account, although I have an owner role there are certain things that I cannot do. The Azure admin can be an IT specialist from the university or an external IT consulting firm, in our case is the latter. The free tier does not have these limitations.

I’m currently exploring why I can’t select an N-series instance with GPU acceleration in ML Studio while in the VM service I can change regions and submit a quota request for such instance.

On Microsoft Learn they’ve stated that the request is subject to approval. As I’ve never done this before, I don’t know what will happen.

Edit. In https://ml.azure.com/quota, I see that regardless of what region I choose I can’t select an N-series instance (‘0 cores available’).

On the CLI, I’ve run

# Get a list of all locations
locations=$(az account list-locations --query "[].name" -o tsv)

# Iterate through all locations
for location in $locations; do
    echo "Checking $location:"
    az vm list-usage --location "$location" -o table | grep -E "Standard NC|NDS"
    echo ""
done

and got these results for all cases:

Standard NC Family vCPUs                  0               6
Standard NDS Family vCPUs                 0               0
Standard NCSv2 Family vCPUs               0               0
Standard NCSv3 Family vCPUs               0               0
Standard NC Promo Family vCPUs            0               6
Standard NDSv3 Family vCPUs               0               0
Standard NDSv2 Family vCPUs               0               0
Standard NCASv3_T4 Family vCPUs           0               0
Standard NCADS_A100_v4 Family vCPUs       0               0
Standard NCADSA10v4 Family vCPUs          0               0
Standard NDSH100v5 Family vCPUs           0               0
Standard NCadsH100v5 Family vCPUs         0               0

The 2nd value is CurrentValue and the 3rd one is Limit. Although ‘Standard NC Promo Family vCPUs’ appears as an available option, it was discontinued in 2023. In essence, this means that no options are available.

To sum up, the Azure for Students account does not provide access to an GPU-accelerated instance (N-series). I can’t confirm whether the same applies to the free tier, but Microsoft states that:

Free subscriptions including Azure Free Account and Azure for Students aren’t eligible for limit or quota changes.

Thank you so much for such a detailed approach.
Yes, the methods you mentioned make sense.
However, I have come up with another idea. Now, my code is finalized. There are no bugs and nothing need to be changed. So, I have just zipped it, uploaded to kaggle and doing training over there. Working out for me so far. thankgod

Great news, what limitations does Kaggle impose?

Based on what I’ve learnt, your only option is Google Cloud; either by requesting access to the TPU Research Cloud or by converting a Free tier account into a pay-as-you-go one during the first 30 days. In that way, you’ll still be able to use the $300 credits.

6 hours per session. total 30 hours per week.

1 Like