MLEP C4W3: Bug | Does not have minimum availability | Impossible to complete lab

Matt_Weber · April 28, 2023, 12:22pm

The lab: Implementing Canary Releases of TensorFlow Model Deployments with Kubernetes and Anthos Service Mesh

When you get to Task 6 there is the error:

Does not have minimum availability

I’ve tried to set –num_nodes value to 4 and also used the default of 3 (currently set in the lab).

This error happens every time no matter how carefully I do it. At this point I am not able to finish the course due to this bug in the lab.

Isaak_Kamau · April 28, 2023, 2:00pm

Hello @Matt_Weber
Try to review the logs and error messages from your deployment to see if they provide any additional insight into the problem. Try the kubectl logs command to view the logs and you can check the resource utilization using the kubectl top command.

Matt_Weber · April 28, 2023, 5:49pm

Thank you for the reply. I retried, and eventually, it worked with 4 nodes. Maybe there were some temporary resource issues?

pq53ui · May 1, 2023, 1:59pm

Hi. I am getting similar issues at Task 6. When inspecting the pod with kubectl describe pods I get the warning message:

Tolerations:        node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Existsfor 300s
Events:
  Type     Reason             Age    From                Message
  ----     ------             ----   ----                -------
  Warning  FailedScheduling   3m39s  default-scheduler   0/3 nodes are available: 3 Insufficient cpu. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod.
  Normal   NotTriggerScaleUp  3m37s  cluster-autoscaler  pod didn't trigger scale-up:

I keep getting signed out of the account as well. I tried completing the lab twice now, both times this occurred at the step 6.2.
I successfully create the deployment, but the deployment is then never ready due to the insufficient cpu.

Should I just wait and try again later or am I doing something wrong here?

Isaak_Kamau · May 2, 2023, 6:36am

Hello @pq53ui
Try increasing the CPU request of the pod, You can update the pod spec to request fewer CPU resources or increase the CPU limit of the nodes in the cluster or try adding more nodes to the cluster. Please refer to this thread for more info: Assign CPU Resources to Containers and Pods | Kubernetes

Topic		Replies	Views
Graded lab: Implementing Canary Releases of TensorFlow Model Deployments with Kubernetes and Anthos Service Mesh help_outline language Deploying Machine Learning Models in Production	2	598	March 27, 2023
C4W3: Implementing Canary Releases of TensorFlow Model Deployments with Kubernetes and Anthos Service Mesh Deploying Machine Learning Models in Production	21	700	August 10, 2023
C4W3 ungraded lab issue Deploying Machine Learning Models in Production	1	583	July 26, 2022
C4W3-UGL Intro to Kubeflow Pipelines Deploying Machine Learning Models in Production	3	637	September 25, 2021
When can "Implementing Canary Releases of TensorFlow Model Deployments with Kubernetes and Anthos Service Mesh" be available again Deploying Machine Learning Models in Production	5	657	December 27, 2022

MLEP C4W3: Bug | Does not have minimum availability | Impossible to complete lab

Related topics