Hi,
I am facing problem progressing for this lab exercise.
Under " Creating a Cloud Storage bucket", I used the below code (extracted from the shell terminal)
student_00_219a375a9c6d@cloudshell:~ (qwiklabs-gcp-02-3dfc0de86241) export TFJOB_BUCKET={qwiklabs-gcp-02-3dfc0de86241}-bucket
gsutil mb gs://${TFJOB_BUCKET}
Creating gs://gcp-02-3dfc0de86241-bucket/…
student_00_219a375a9c6d@cloudshell:~ (qwiklabs-gcp-02-3dfc0de86241)$ gsutil ls
gs://gcp-02-3dfc0de86241-bucket/
But when I clicked on “Check my progress” , my progress was not verified correct (no green tick) and it indicated “Please create a bucket named ‘qwiklabs-gcp-02-3dfc0de86241-bucket’.”
May I know what am I not doing correctly?
Unless you’ve defined a variable named qwiklabs-gcp-02-3dfc0de86241
, you’re going to fail the grader. See the output of gsutil ls
. The bucket has a name different from project name as the prefix.
This goes back to variable substitution.
I recommend using ${DEVSHELL_PROJECT_ID}
since it should be defined in the cloud shell.
Thank you for your reply.
I wanted to try out your suggestion, but it says that I have exceeded my quota for this lab.
May I know how I should go about continuing this lab?
Thank you
Please try to start the assignment from the course lab page. Contact qwiklabs if you don’t have the option to start the lab.
Hello,
I am stuc at step 5: Task 5. Submitting the TFJob
Here is my YAML file:
apiVersion: kubeflow.org/v1
kind: TFJob
metadata:
name: multi-worker
spec:
cleanPodPolicy: None
tfReplicaSpecs:
Worker:
replicas: 3
template:
spec:
containers:
- name: tensorflow
image: gcr.io/qwiklabs-gcp-00-22097e7b67c7/mnist-train
args:
- --epochs=5
- --steps_per_epoch=100
- --per_worker_batch=64
- --saved_model_path=gs://qwiklabs-gcp-00-22097e7b67c7-bucket/saved_model_dir
- --checkpoint_path=gs://qwiklabs-gcp-00-22097e7b67c7-bucket/checkpoints
After applying changes, I have the following error:
$ kubectl describe tfjob multi-worker
Name: multi-worker
Namespace: default
Labels: <none>
Annotations: <none>
API Version: kubeflow.org/v1
Kind: TFJob
Metadata:
Creation Timestamp: 2023-06-24T21:50:21Z
Generation: 4
Resource Version: 26337
UID: 57051154-c72f-4861-a591-b67ac0317bbe
Spec:
Clean Pod Policy: None
Tf Replica Specs:
Worker:
Replicas: 3
Template:
Spec:
Containers:
Args:
--epochs=5
--steps_per_epoch=100
--per_worker_batch=64
--saved_model_path=gs://qwiklabs-gcp-00-22097e7b67c7-bucket/saved_model_dir
--checkpoint_path=gs://qwiklabs-gcp-00-22097e7b67c7-bucket/checkpoints
Image: gcr.io/qwiklabs-gcp-00-22097e7b67c7/mnist-train
Name: tensorflow
Status:
Conditions:
Last Transition Time: 2023-06-24T21:50:22Z
Last Update Time: 2023-06-24T21:50:22Z
Message: Failed to marshal the object to TFJob; the spec is invalid: failed to marshal the object to TFJob
Reason: InvalidTFJobSpec
Status: True
Type: Failed
Replica Statuses: <nil>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning InvalidTFJobSpec 28m tf-operator Failed to marshal the object to TFJob; the spec is invalid: failed to marshal the object to TFJob
Can someone help me please ?
I just tried the lab with your yaml file. The lab works as expected.
It’s possible that you might have entered a special character while typing the yaml file. Unfortunately, the log doesn’t provide more details other than telling that the job spec is invalid.
Please try again and contact qwiklabs help (click the question symbol on top right of the lab page) if you don’t deploy the job.