Error in Week 3 Notebook "TFX on Google Cloud AI Platform Pipelines"

2021-09-11 15:23:41.137046: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library ‘libcudart.so.10.1’; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-09-11 15:23:41.137191: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
CLI
Creating pipeline
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
Failed to load kube config.
Traceback (most recent call last):
File “/opt/conda/lib/python3.7/site-packages/urllib3/connection.py”, line 170, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File “/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py”, line 96, in create_connection
raise err
File “/opt/conda/lib/python3.7/site-packages/urllib3/util/connection.py”, line 86, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 706, in urlopen
chunked=chunked,
File “/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File “/opt/conda/lib/python3.7/site-packages/urllib3/connection.py”, line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File “/opt/conda/lib/python3.7/http/client.py”, line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/opt/conda/lib/python3.7/http/client.py”, line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/opt/conda/lib/python3.7/http/client.py”, line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/opt/conda/lib/python3.7/http/client.py”, line 1032, in _send_output
self.send(msg)
File “/opt/conda/lib/python3.7/http/client.py”, line 972, in send
self.connect()
File “/opt/conda/lib/python3.7/site-packages/urllib3/connection.py”, line 200, in connect
conn = self._new_conn()
File “/opt/conda/lib/python3.7/site-packages/urllib3/connection.py”, line 182, in _new_conn
self, "Failed to establish a new connection: s" e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ff2c5001190>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/jupyter/.local/bin/tfx”, line 8, in
sys.exit(cli_group())
File “/home/jupyter/.local/lib/python3.7/site-packages/click/core.py”, line 829, in call
return self.main(*args, **kwargs)
File “/home/jupyter/.local/lib/python3.7/site-packages/click/core.py”, line 782, in main
rv = self.invoke(ctx)
File “/home/jupyter/.local/lib/python3.7/site-packages/click/core.py”, line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/home/jupyter/.local/lib/python3.7/site-packages/click/core.py”, line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/home/jupyter/.local/lib/python3.7/site-packages/click/core.py”, line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/home/jupyter/.local/lib/python3.7/site-packages/click/core.py”, line 610, in invoke
return callback(*args, **kwargs)
File “/home/jupyter/.local/lib/python3.7/site-packages/click/decorators.py”, line 73, in new_func
return ctx.invoke(f, obj, *args, **kwargs)
File “/home/jupyter/.local/lib/python3.7/site-packages/click/core.py”, line 610, in invoke
return callback(*args, **kwargs)
File “/home/jupyter/.local/lib/python3.7/site-packages/tfx/tools/cli/commands/pipeline.py”, line 117, in create_pipeline
handler_factory.create_handler(ctx.flags_dict).create_pipeline()
File “/home/jupyter/.local/lib/python3.7/site-packages/tfx/tools/cli/handler/handler_factory.py”, line 107, in create_handler
return detect_handler(flags_dict)
File “/home/jupyter/.local/lib/python3.7/site-packages/tfx/tools/cli/handler/handler_factory.py”, line 63, in detect_handler
return kubeflow_handler.KubeflowHandler(flags_dict)
File “/home/jupyter/.local/lib/python3.7/site-packages/tfx/tools/cli/handler/kubeflow_handler.py”, line 57, in init
namespace=self.flags_dict[labels.NAMESPACE])
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp/_client.py”, line 148, in init
if not self._context_setting[‘namespace’] and self.get_kfp_healthz().multi_user is True:
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp/_client.py”, line 312, in get_kfp_healthz
response = self._healthz_api.get_healthz()
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp_server_api/api/healthz_service_api.py”, line 63, in get_healthz
return self.get_healthz_with_http_info(**kwargs) # noqa: E501
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp_server_api/api/healthz_service_api.py”, line 148, in get_healthz_with_http_info
collection_formats=collection_formats)
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp_server_api/api_client.py”, line 369, in call_api
_preload_content, _request_timeout, _host)
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp_server_api/api_client.py”, line 185, in __call_api
_request_timeout=_request_timeout)
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp_server_api/api_client.py”, line 393, in request
headers=headers)
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp_server_api/rest.py”, line 234, in GET
query_params=query_params)
File “/home/jupyter/.local/lib/python3.7/site-packages/kfp_server_api/rest.py”, line 212, in request
headers=headers)
File “/opt/conda/lib/python3.7/site-packages/urllib3/request.py”, line 75, in request
method, url, fields=fields, headers=headers, **urlopen_kw
File “/opt/conda/lib/python3.7/site-packages/urllib3/request.py”, line 96, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File “/opt/conda/lib/python3.7/site-packages/urllib3/poolmanager.py”, line 375, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File “/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 796, in urlopen
**response_kw
File “/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 796, in urlopen
**response_kw
File “/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 796, in urlopen
**response_kw
File “/opt/conda/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 756, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File “/opt/conda/lib/python3.7/site-packages/urllib3/util/retry.py”, line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=‘localhost’, port=80): Max retries exceeded with url: /apis/v1beta1/healthz (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x7ff2c5001190>: Failed to establish a new connection: [Errno 111] Connection refused’))

Hi Nikolay! It seems there are packages missing. Have you executed this command as mentioned in the instructions?

cd training-data-analyst/self-paced-labs/tfx/tfx-ai-platform
./install.sh

Also just to check, when you created the cluster in the earlier setup, did you also have this box checked?

I have the same error. Regarding the “Allow access to the following Cloud APIs”, we don’t have this check button, as cluster1 is already created when we start the lab.

Hi Takashi! Can you do the instructions in the README here before doing the notebook ? Ideally, you shouldn’t have to but I think there’s a bug with the automatic creation of the cluster. Please give it a shot. Hope it works!

Hi, thanks for your reply. It was my mistake as I set ENDPOINT wrongly in the notebook.