C4W3_Lab1_KubeflowPipelines kubectl-f apply Fail

Evan_Rabeaux · August 3, 2022, 2:54am

Hello I am having trouble installing Kubeflow Pipelines on top on my Running Kubernetes Cluster in Kind.

Cluster was created using:
kind cluster create --image kindest/node:v1.21.2

Cluster is successfully up and running, but when I try to run the following sequence of commands:

export PIPELINE_VERSION=1.7.0
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

I get a timeout error:

I printed kustomize version, kind version, kubectl version, kubectl get nodes
terminal output for any version issues I might have to be realized. Any input would be great-as to what the problem might be/how to fix it?

Thanks,
Evan

balaji.ambresh · August 3, 2022, 7:17am

After launching dockerd, please create your kind cluster like this and then run rest of the steps:
kind create cluster --image=kindest/node:v1.21.2

Evan_Rabeaux · August 3, 2022, 4:06pm

Hello balaji,

Thanks for your response but as you can see in my post post above that is the initial way I created my kind cluster with the image kindest/node:v1.21.2 that you mentioned but the kubectl -f apply command for kubeflow pipeline manifest was still failing.

Evan

balaji.ambresh · August 3, 2022, 5:16pm

Thanks for the follow up.

Your screenshot isn’t aligned with the commands you shared on the original post. See the --timeout=60 flag. It’s not there in
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION" but your screenshot has it.

Please remove that flag and try again.

Evan_Rabeaux · August 3, 2022, 6:42pm

Still not working

Evan_Rabeaux · August 3, 2022, 6:47pm

If I try to go to the github url in the browser, I keep getting a 404 Error Webpage not found. Has the link to the .yaml files changed?

balaji.ambresh · August 3, 2022, 6:55pm

This is the path for browsing:

Could you please try with the following?
kfp==1.8.13 (this means that the export PIPIELINE_VERSION should also change)

$ pip install -U kfp

My
kind == 0.14.0
kustomize = 4.5.4
kubectl = 1.24.2

Evan_Rabeaux · August 3, 2022, 7:11pm

Running in existing python virtual environment: successful kfp install version 1.8.13
%pip install -U kfp

Then following up with kubectl apply -f command I am still getting the same Error:

balaji.ambresh · August 3, 2022, 7:20pm

The url you used points to master.

Please run the following commands:

kind delete cluster
kind create cluster --image=kindest/node:v1.21.2
export PIPELINE_VERSION=1.8.13
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

Evan_Rabeaux · August 3, 2022, 7:31pm

Following the commands you sent in the last post I get the output:

But the path that you sent over earlier in the hyperlink above points to this link:

The path in the notebook points to this: this receives a 404 ERROR
github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources

Both of them are failing which link is it 1 or 2?

balaji.ambresh · August 3, 2022, 7:36pm

Let’s stay with the last set of commands I gave you.

Can you try modifying one command based on this?

kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION&timeout=300"

Evan_Rabeaux · August 3, 2022, 7:43pm

Thank you for your quick help, I appreciate it I had been stuck on this for a bit!

balaji.ambresh · August 3, 2022, 7:44pm

You’re welcome. I’ll ask the staff to add this to the notebook as well.

Evan_Rabeaux · August 3, 2022, 7:46pm

Ok great I’m sure it will help some others out.
Another thing was I am not sure if ‘pip install -U kfp’ was necessary to make it work but I did not see that mentioned anywhere in the Notebook.

Thanks again!

Evan_Rabeaux · August 4, 2022, 5:49am

After applying the previous commands I waited all day for the pods to be all up and running but after 6+ hours they never reached the running status:

not sure what the problem was.

balaji.ambresh · August 4, 2022, 5:52am

What is your hardware configuration in terms of RAM and number of CPU cores?

Evan_Rabeaux · August 4, 2022, 6:05am

I’m on Apple M1 Silicon 2020
8 Core CPU
16 GB RAM

balaji.ambresh · August 4, 2022, 6:29am

Could we try the following?

$ kubectl get pods -n kubeflow

For each pod that’s not ready, see the logs like this (let’s start with mysql):

$ kubectl logs mysql-f7b9b7dd4-2478k -n kubeflow
# I've manually trimmed to only the last few lines
2022-08-04T06:14:04.628127Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2022-08-04T06:14:04.628157Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2022-08-04T06:14:04.629407Z 0 [Note] InnoDB: 5.7.33 started; log sequence number 12664279
2022-08-04T06:14:04.630284Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-08-04T06:14:04.630656Z 0 [Note] Plugin 'FEDERATED' is disabled.
2022-08-04T06:14:04.636129Z 0 [Note] InnoDB: Buffer pool(s) load completed at 220804  6:14:04
2022-08-04T06:14:04.640858Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2022-08-04T06:14:04.640882Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2022-08-04T06:14:04.641970Z 0 [Warning] CA certificate ca.pem is self signed.
2022-08-04T06:14:04.642032Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2022-08-04T06:14:04.642805Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2022-08-04T06:14:04.642860Z 0 [Note] IPv6 is available.
2022-08-04T06:14:04.642883Z 0 [Note]   - '::' resolves to '::';
2022-08-04T06:14:04.642912Z 0 [Note] Server socket created on IP: '::'.
2022-08-04T06:14:04.646930Z 0 [Warning] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
2022-08-04T06:14:04.663204Z 0 [Note] Event Scheduler: Loaded 0 events
2022-08-04T06:14:04.663530Z 0 [Note] mysqld: ready for connections.
Version: '5.7.33'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)

Evan_Rabeaux · August 6, 2022, 7:15am

Hello balaji,

5 troublesome pods with large amounts of restarts.
3 that were currently not running.
and 2 pods that failed to ever run stuck in pending state (mysql/workflow controller).

I ran the cluster again and applied the troubleshooting method you recommended with observing the logs this is the output:

balaji.ambresh · August 6, 2022, 4:00pm

Why does mysql pod not have any logs?

Topic		Replies	Views
C4_W3_lab_1_kubeflow_pipeline Deploying Machine Learning Models in Production	1	601	January 3, 2023
C4W3-UGL Intro to Kubeflow Pipelines Deploying Machine Learning Models in Production	3	637	September 25, 2021
C4W3 Lab1: Kubeflow deployment always pending Deploying Machine Learning Models in Production	3	644	May 10, 2022
C4W3 ungraded lab issue Deploying Machine Learning Models in Production	1	583	July 26, 2022
C4W3:Ungraded Lab: Building ML Pipelines with Kubeflow Deploying Machine Learning Models in Production	3	594	June 24, 2022

C4W3_Lab1_KubeflowPipelines kubectl-f apply Fail

Related topics