C3W3 Lab error in Task4

Getting this error

student_01_c6b5de8980d8@cloudshell:~/lab-files (qwiklabs-gcp-01-3b94eac212a1)$ strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
task_type = strategy.cluster_resolver.task_type
task_id = strategy.cluster_resolver.task_id
global_batch_size = per_worker_batch * strategy.num_replicas_in_sync
-bash: syntax error near unexpected token `(’
-bash: task_type: command not found
-bash: task_id: command not found
-bash: global_batch_size: command not found

You just typed the python expression on commandline instead of the appropriate file. The error is expected since the shell doesn’t understand python expressions. The prefix of bash says it all.

Should I type the file name in the parenthesis ?

Assuming that you’ve cloned the repo in the console environment, edit lab-files/mnist/main.py.

Unfortunately my quota for this lab has been exceeded !
How can this be extended

Please contact qwiklabs help.

Thank you for your quick response.
Since I am a non it professional picking up things, could you kindly explain in some details where and how I went wrong .
Also exactly what corrections I make in the code pasted in the first message.
Thanks in advance

cd
SRC_REPO=GitHub - GoogleCloudPlatform/mlops-on-gcp
kpt pkg get $SRC_REPO/workshops/mlep-qwiklabs/distributed-training-gke lab- files/mnist/main.py .
cd lab- files/mnist/main.py .

Then
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy()
task_type = strategy.cluster_resolver.task_type
task_id = strategy.cluster_resolver.task_id
global_batch_size = per_worker_batch * strategy.num_replicas_in_sync

Is this right

The first few steps to get into lab_files is given clearly in the writeup for task 4.

You don’t have to edit lab_files/mnist/main.py. The file already contains the calls for performing distributed training.

Just build the docker file from inside lab_files directory and deploy.

just how do i do this ?
New to docker

Please follow the instructions under: Packaging training code in a docker image

Yes started again.
Created the bucket : A Cloud Storage bucket named ‘gcp-03-f1413ab82833-bucket’ already exists.
However Getting an error on the assessment with following message
Please create a bucket named ‘qwiklabs-gcp-03-f1413ab82833-bucket’.
I tried to recreate the bucket with above credentials and it does not allow me to create this bucket saying a bucket with ‘gcp-03-f1413ab82833-bucket’ already exists.
When I proceeded with this the mint file does not exist
Do I restart the lab and do the same again

will clarify again.
1.Realised yesterday I had not created the proper project.
2.as per the earlier message though I created the project ID gcp-03-f1413ab82833. The assessment gave me an error and asked to create project :qwiklabs-gcp-03-f1413ab82833
3.When I again tried to create the project , it said the project with credentials gcp-03-f1413ab82833-bucket already existed.
4. While loading the training code with ‘gcp-03-f1413ab82833-bucket’ it gave an error message mnist-train does not exist
5.Also should we specify the project id with colons or plain

Thanking you in advance

As shown in the instructions, PROJECT_ID=$(gcloud config get-value project) is the easiest way to capture the project id.

echo $PROJECT_ID to print the project id.

There should be no colons and you should not create a project id from scratch. For the lab, use the project id associated with your account when you sign in.

i am in the assignment and in last stage and waiting.
student_01_c6b5de8980d8@cloudshell:~/lab-files (qwiklabs-gcp-03-86524c6e33fc) kubectl get pods NAME READY STATUS RESTARTS AGE multi-worker-worker-0 0/1 ImagePullBackOff 0 57s multi-worker-worker-1 0/1 ErrImagePull 0 57s multi-worker-worker-2 0/1 ImagePullBackOff 0 57s student_01_c6b5de8980d8@cloudshell:~/lab-files (qwiklabs-gcp-03-86524c6e33fc) kubectl logs --follow {JOB_NAME}-worker-0 Error from server (BadRequest): container "tensorflow" in pod "multi-worker-worker-0" is waiting to start: trying and failing to pull image student_01_c6b5de8980d8@cloudshell:~/lab-files (qwiklabs-gcp-03-86524c6e33fc) kubectl get pods
NAME READY STATUS RESTARTS AGE
multi-worker-worker-0 0/1 ImagePullBackOff 0 4m35s
multi-worker-worker-1 0/1 ImagePullBackOff 0 4m35s
multi-worker-worker-2 0/1 ImagePullBackOff 0 4m35s
student_01_c6b5de8980d8@cloudshell:~/lab-files (qwiklabs-gcp-03-86524c6e33fc) kubectl get pods NAME READY STATUS RESTARTS AGE multi-worker-worker-0 0/1 ImagePullBackOff 0 6m32s multi-worker-worker-1 0/1 ImagePullBackOff 0 6m32s multi-worker-worker-2 0/1 ImagePullBackOff 0 6m32s student_01_c6b5de8980d8@cloudshell:~/lab-files (qwiklabs-gcp-03-86524c6e33fc)

how will I know that workers have started running

Getting this error

student_01_c6b5de8980d8@cloudshell:~/lab-files (qwiklabs-gcp-03-86524c6e33fc) kubectl logs --follow {JOB_NAME}-worker-0
Error from server (BadRequest): container “tensorflow” in pod “multi-worker-worker-0” is waiting to start: trying and failing to pull image

Sorry for continous trouble

Where do we update the --saved_model_path and --checkpoint_path arguments by replacing the bucket token with the name of your Cloud storage bucket. Recall that your bucket name is [YOUR_PROJECT_ID]-bucket.
please tell me the step in the command list

Please see the snippet below this text on the labs page:
The updated manifest should look similar to the one below:

Your lab-files/tfjob.yaml file should contain similar content. qwiklabs-gcp-01-93af833e6576 should be replaced with your project id.

thank you for your quick reply.
My query is
1.where do I locate this.yaml file to make this change ?
2. I generated the image by : gcloud container images list in the format gcr.io/<YOUR_PROJECT_ID>/mnist-train.
3.But I am stuck in
Next, update the --saved_model_path and --checkpoint_path arguments by replacing the bucket token with the name of your Cloud storage bucket. Recall that your bucket name is [YOUR_PROJECT_ID]-bucket.
4.How do I generate the updated manifest with the saved model path and checkpoint path
5. Please be patient with my queries since I am CLI ignorant.
6. Do I make changes in
IMAGE_NAME=mnist-train
docker build -t gcr.io/{PROJECT_ID}/{IMAGE_NAME} .
docker push gcr.io/{PROJECT_ID}/{IMAGE_NAME}

with project id

Do you see it under lab-files/tfjob.yaml ?