C4_W2_Lab_4_Apache_Beam_and_Tensorflow preprocessing fails

Hi,

when running the scripts from C4_W2_Lab_4_Apache_Beam_and_Tensorflow the preprocess step fails.

python ./molecules/preprocess.py --work-dir=molecules
2022-03-02 19:31:56.809690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1932] Ignoring visible gpu device (device: 1, name: NVIDIA GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1) with core count: 5. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
2022-03-02 19:31:56.809948: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-02 19:31:57.289857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10406 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter.
2022-03-02 19:31:58.633935: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Traceback (most recent call last):
  File "/home/user/venv/ai/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 286, in encode
    feature_handler.encode_value(value)
  File "/home/user/venv/ai/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 154, in encode_value
    self._value.append(self._cast_fn(values))
  File "/home/user/venv/ai/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1057, in __index__
    return self._numpy().__index__()
TypeError: only integer scalar arrays can be converted to a scalar index

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 537, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/transforms/core.py", line 1638, in <lambda>
    wrapper = lambda x: [fn(x)]
  File "/home/user/venv/ai/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 288, in encode
    raise TypeError('%s while encoding feature "%s"' %
TypeError: only integer scalar arrays can be converted to a scalar index while encoding feature "TotalC"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./molecules/preprocess.py", line 220, in <module>
    preprocess_data = run(
  File "./molecules/preprocess.py", line 195, in run
    _ = (
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/pipeline.py", line 596, in __exit__
    self.result = self.run()
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/pipeline.py", line 573, in run
    return self.runner.run_pipeline(self, self._options)
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/direct/direct_runner.py", line 131, in run_pipeline
    return runner.run_pipeline(pipeline, options)
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 199, in run_pipeline
    self._latest_run_result = self.run_via_runner_api(
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 210, in run_via_runner_api
    return self.run_stages(stage_context, stages)
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 392, in run_stages
    stage_results = self._run_stage(
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 657, in _run_stage
    self._run_bundle(
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 780, in _run_bundle
    result, splits = bundle_manager.process_bundle(
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 1091, in process_bundle
    result_future = self._worker_handler.control_conn.push(process_bundle_req)
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/worker_handlers.py", line 378, in push
    response = self.worker.do_instruction(request)
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 580, in do_instruction
    return getattr(self, request_type)(
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 618, in process_bundle
    bundle_processor.process_bundle(instruction_id))
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/worker/bundle_processor.py", line 995, in process_bundle
    input_op_by_transform_id[element.transform_id].process_encoded(
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/runners/worker/bundle_processor.py", line 221, in process_encoded
    self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 346, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 348, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1265, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1361, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1265, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1361, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1265, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1361, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1265, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1361, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1265, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1361, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1265, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1361, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1265, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 536, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1361, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 152, in apache_beam.runners.worker.operations.ConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 707, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 708, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1200, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1281, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1198, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 537, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "/home/user/venv/ai/lib/python3.8/site-packages/apache_beam/transforms/core.py", line 1638, in <lambda>
    wrapper = lambda x: [fn(x)]
  File "/home/user/venv/ai/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 288, in encode
    raise TypeError('%s while encoding feature "%s"' %
TypeError: only integer scalar arrays can be converted to a scalar index while encoding feature "TotalC" [while running 'Feature scaling/AnalyzeDataset/InstanceDictToRecordBatch/EncodeInstanceDictsAsTfExample']
python ./molecules/preprocess.py --work-dir=molecules  5.22s user 1.51s system 129% cpu 5.178 total

I’m running this on tf==2.7.1 and apache beam 2.36.0 in python 3.8. How can I fix this issue?

1 Like

Please click my name and message your notebook as an attachment.

This is the same notebook as shown here.

Attached is my run of the lab.
Copy_of_C4_W2_Lab_4_Apache_Beam_and_Tensorflow.ipynb (71.5 KB)

Can’t seem to reproduce your issue. Would you mind doing a factory reset runtime and trying again?

Be sure to save a copy the notebook into your drive before running any cells.

Hi, this would not work, as I do not have a google account.

I have debugged this a little more and found out, that I had some different packages installed. I have created a new venv with these requirements:

apache-beam==2.32
tensorflow-transform==1.3.0
tensorflow==2.6.0
tensorflow-serving-api==2.6.0

However, now I get the following error when executing the preprocessing:

python ./molecules/preprocess.py --work-dir=results
2022-03-03 18:42:03.018629: E tensorflow/core/lib/monitoring/collection_registry.cc:77] Cannot register 2 metrics with the same name: /tensorflow/api/keras/optimizers
Traceback (most recent call last):
  File "./molecules/preprocess.py", line 25, in <module>
    import pubchem
  File "/home/user/notebooks/coursera/molecules/pubchem/__init__.py", line 14, in <module>
    from .pipeline import *
  File "/home/user/notebooks/coursera/molecules/pubchem/pipeline.py", line 22, in <module>
    import tensorflow_transform as tft
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/__init__.py", line 21, in <module>
    from tensorflow_transform.inspect_preprocessing_fn import *
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/inspect_preprocessing_fn.py", line 19, in <module>
    from tensorflow_transform import impl_helper
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/impl_helper.py", line 31, in <module>
    from tensorflow_transform.output_wrapper import TFTransformOutput
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/output_wrapper.py", line 60, in <module>
    class TFTransformOutput:
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/output_wrapper.py", line 231, in TFTransformOutput
    def transform_features_layer(self) -> tf.keras.Model:
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow/python/util/lazy_loader.py", line 62, in __getattr__
    module = self._load()
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow/python/util/lazy_loader.py", line 45, in _load
    module = importlib.import_module(self.__name__)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/__init__.py", line 25, in <module>
    from keras import models
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/models.py", line 20, in <module>
    from keras import metrics as metrics_module
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/metrics.py", line 24, in <module>
    from keras import activations
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/activations.py", line 20, in <module>
    from keras.layers import advanced_activations
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/layers/__init__.py", line 23, in <module>
    from keras.engine.input_layer import Input
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/engine/input_layer.py", line 21, in <module>
    from keras.engine import base_layer
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/engine/base_layer.py", line 43, in <module>
    from keras.mixed_precision import loss_scale_optimizer
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/mixed_precision/loss_scale_optimizer.py", line 18, in <module>
    from keras import optimizers
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/optimizers.py", line 31, in <module>
    from keras.optimizer_v2 import adadelta as adadelta_v2
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/optimizer_v2/adadelta.py", line 22, in <module>
    from keras.optimizer_v2 import optimizer_v2
  File "/home/user/venv/coursera/lib/python3.8/site-packages/keras/optimizer_v2/optimizer_v2.py", line 36, in <module>
    keras_optimizers_gauge = tf.__internal__.monitoring.BoolGauge(
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow/python/eager/monitoring.py", line 360, in __init__
    super(BoolGauge, self).__init__('BoolGauge', _bool_gauge_methods,
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow/python/eager/monitoring.py", line 135, in __init__
    self._metric = self._metric_methods[self._label_length].create(*args)
tensorflow.python.framework.errors_impl.AlreadyExistsError: Another metric with the same name already exists.
1 Like

You can sign up using a non-google account.

How is that possible? Can you point me to docs/readme/etc?

Here you go: Create a Google Account - Google Account Help

lol - that would mean, I would create a google-account. This is not, what I want :smiley:

I’d like to get back on topic. How can I fix the underlying error in the preprocessing script?

//edit - steps to reproduce:

  1. create venv
python3.8 -m venv venv
source venv/bin/activate
  1. with requirements.txt defined as:
apache-beam==2.32
tensorflow-transform==1.3.0
tensorflow==2.6.0
tensorflow-serving-api==2.6.0

call

pip install -r requirements.txt
  1. Execute notebook steps
wget https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/raw/main/course4/week2-ungraded-labs/C4_W2_Lab_4_ETL_Beam/data/molecules.tar.gz
tar -xvzf molecules.tar.gz
python ./molecules/data-extractor.py --max-data-files 1 --work-dir=results
python ./molecules/preprocess.py --work-dir=results

Then the error occurs

Go ahead and solve the notebook on google colab and then capture the package versions on that environment.

As I have stated I do not have nor do I want to have a google account. I can not run google colab notebooks. This method of debugging is therefore unavailable for me.

Any updates?

Post must be at least 20 characters

@Johnny56 I’ve upvoted the post for you. Could you do the same?

1 Like

Hi! This seems to be a problem with a separate Keras installation. Can you also downgrade that to version 2.6.0? In Colab, we have tensorflow version 2.6.3 and Keras version 2.6.0 . Hope this helps!

1 Like

Here’s the output of pip list > installed.txt from colab.

Package                         Version
------------------------------- ---------------------
absl-py                         0.12.0
alabaster                       0.7.12
albumentations                  0.1.12
altair                          4.2.0
apache-beam                     2.32.0
appdirs                         1.4.4
argon2-cffi                     21.3.0
argon2-cffi-bindings            21.2.0
arviz                           0.11.4
astor                           0.8.1
astropy                         4.3.1
astunparse                      1.6.3
atari-py                        0.2.9
atomicwrites                    1.4.0
attrs                           21.4.0
audioread                       2.1.9
autograd                        1.3
avro-python3                    1.9.2.1
Babel                           2.9.1
backcall                        0.2.0
beautifulsoup4                  4.6.3
bleach                          4.1.0
blis                            0.4.1
bokeh                           2.3.3
Bottleneck                      1.3.4
branca                          0.4.2
bs4                             0.0.1
CacheControl                    0.12.10
cached-property                 1.5.2
cachetools                      4.2.4
catalogue                       1.0.0
certifi                         2021.10.8
cffi                            1.15.0
cftime                          1.6.0
chardet                         3.0.4
charset-normalizer              2.0.12
clang                           5.0
click                           7.1.2
cloudpickle                     1.3.0
cmake                           3.12.0
cmdstanpy                       0.9.5
colorcet                        3.0.0
colorlover                      0.3.0
community                       1.0.0b1
contextlib2                     0.5.5
convertdate                     2.4.0
coverage                        3.7.1
coveralls                       0.5
crcmod                          1.7
cufflinks                       0.17.3
cvxopt                          1.2.7
cvxpy                           1.0.31
cycler                          0.11.0
cymem                           2.0.6
Cython                          0.29.28
daft                            0.0.4
dask                            2.12.0
datascience                     0.10.6
debugpy                         1.0.0
decorator                       4.4.2
defusedxml                      0.7.1
descartes                       1.1.0
dill                            0.3.1.1
distributed                     1.25.3
dlib                            19.18.0
dm-tree                         0.1.6
docopt                          0.6.2
docutils                        0.17.1
dopamine-rl                     1.0.5
earthengine-api                 0.1.301
easydict                        1.9
ecos                            2.0.10
editdistance                    0.5.3
en-core-web-sm                  2.2.5
entrypoints                     0.4
ephem                           4.1.3
et-xmlfile                      1.1.0
fa2                             0.3.5
fastai                          1.0.61
fastavro                        1.4.10
fastdtw                         0.3.4
fasteners                       0.17.3
fastprogress                    1.0.2
fastrlock                       0.8
fbprophet                       0.7.1
feather-format                  0.4.1
filelock                        3.6.0
firebase-admin                  4.4.0
fix-yahoo-finance               0.0.22
Flask                           1.1.4
flatbuffers                     1.12
folium                          0.8.3
future                          0.18.2
gast                            0.4.0
GDAL                            2.2.2
gdown                           4.2.2
gensim                          3.6.0
geographiclib                   1.52
geopy                           1.17.0
gin-config                      0.5.0
glob2                           0.7
google                          2.0.3
google-api-core                 1.26.3
google-api-python-client        1.12.10
google-apitools                 0.5.31
google-auth                     1.35.0
google-auth-httplib2            0.0.4
google-auth-oauthlib            0.4.6
google-cloud-bigquery           1.21.0
google-cloud-bigquery-storage   1.1.0
google-cloud-bigtable           1.7.0
google-cloud-core               1.7.2
google-cloud-datastore          1.8.0
google-cloud-dlp                1.0.0
google-cloud-firestore          1.7.0
google-cloud-language           1.3.0
google-cloud-pubsub             1.7.0
google-cloud-recommendations-ai 0.2.0
google-cloud-spanner            1.19.1
google-cloud-storage            1.18.1
google-cloud-translate          1.5.0
google-cloud-videointelligence  1.16.1
google-cloud-vision             1.0.0
google-colab                    1.0.0
google-pasta                    0.2.0
google-resumable-media          0.4.1
googleapis-common-protos        1.55.0
googledrivedownloader           0.4
graphviz                        0.10.1
greenlet                        1.1.2
grpc-google-iam-v1              0.12.3
grpcio                          1.44.0
grpcio-gcp                      0.2.2
gspread                         3.4.2
gspread-dataframe               3.0.8
gym                             0.17.3
h5py                            3.1.0
hdfs                            2.6.0
HeapDict                        1.0.1
hijri-converter                 2.2.3
holidays                        0.10.5.2
holoviews                       1.14.8
html5lib                        1.0.1
httpimport                      0.5.18
httplib2                        0.17.4
httplib2shim                    0.0.3
humanize                        0.5.1
hyperopt                        0.1.2
ideep4py                        2.0.0.post3
idna                            2.10
imageio                         2.4.1
imagesize                       1.3.0
imbalanced-learn                0.8.1
imblearn                        0.0
imgaug                          0.2.9
importlib-metadata              4.11.2
importlib-resources             5.4.0
imutils                         0.5.4
inflect                         2.1.0
iniconfig                       1.1.1
intel-openmp                    2022.0.2
intervaltree                    2.1.0
ipykernel                       4.10.1
ipython                         5.5.0
ipython-genutils                0.2.0
ipython-sql                     0.3.9
ipywidgets                      7.6.5
itsdangerous                    1.1.0
jax                             0.3.1
jaxlib                          0.3.0+cuda11.cudnn805
jedi                            0.18.1
jieba                           0.42.1
Jinja2                          2.11.3
joblib                          1.1.0
jpeg4py                         0.1.4
jsonschema                      4.3.3
jupyter                         1.0.0
jupyter-client                  5.3.5
jupyter-console                 5.2.0
jupyter-core                    4.9.2
jupyterlab-pygments             0.1.2
jupyterlab-widgets              1.0.2
kaggle                          1.5.12
kapre                           0.3.7
keras                           2.6.0
Keras-Preprocessing             1.1.2
keras-vis                       0.4.1
kiwisolver                      1.3.2
korean-lunar-calendar           0.2.1
libclang                        13.0.0
librosa                         0.8.1
lightgbm                        2.2.3
llvmlite                        0.34.0
lmdb                            0.99
LunarCalendar                   0.0.9
lxml                            4.2.6
Markdown                        3.3.6
MarkupSafe                      2.0.1
matplotlib                      3.2.2
matplotlib-inline               0.1.3
matplotlib-venn                 0.11.6
missingno                       0.5.1
mistune                         0.8.4
mizani                          0.6.0
mkl                             2019.0
mlxtend                         0.14.0
more-itertools                  8.12.0
moviepy                         0.2.3.5
mpmath                          1.2.1
msgpack                         1.0.3
multiprocess                    0.70.12.2
multitasking                    0.0.10
murmurhash                      1.0.6
music21                         5.5.0
natsort                         5.5.0
nbclient                        0.5.12
nbconvert                       5.6.1
nbformat                        5.1.3
nest-asyncio                    1.5.4
netCDF4                         1.5.8
networkx                        2.6.3
nibabel                         3.0.2
nltk                            3.2.5
notebook                        5.3.1
numba                           0.51.2
numexpr                         2.8.1
numpy                           1.19.5
nvidia-ml-py3                   7.352.0
oauth2client                    4.1.3
oauthlib                        3.2.0
okgrade                         0.4.3
opencv-contrib-python           4.1.2.30
opencv-python                   4.1.2.30
openpyxl                        3.0.9
opt-einsum                      3.3.0
orjson                          3.6.7
osqp                            0.6.2.post0
packaging                       21.3
palettable                      3.3.0
pandas                          1.3.5
pandas-datareader               0.9.0
pandas-gbq                      0.13.3
pandas-profiling                1.4.1
pandocfilters                   1.5.0
panel                           0.12.1
param                           1.12.0
parso                           0.8.3
pathlib                         1.0.1
patsy                           0.5.2
pep517                          0.12.0
pexpect                         4.8.0
pickleshare                     0.7.5
Pillow                          7.1.2
pip                             21.1.3
pip-tools                       6.2.0
plac                            1.1.3
plotly                          5.5.0
plotnine                        0.6.0
pluggy                          0.7.1
pooch                           1.6.0
portpicker                      1.3.9
prefetch-generator              1.0.1
preshed                         3.0.6
prettytable                     3.2.0
progressbar2                    3.38.0
prometheus-client               0.13.1
promise                         2.3
prompt-toolkit                  1.0.18
proto-plus                      1.20.3
protobuf                        3.19.4
psutil                          5.4.8
psycopg2                        2.7.6.1
ptyprocess                      0.7.0
py                              1.11.0
pyarrow                         2.0.0
pyasn1                          0.4.8
pyasn1-modules                  0.2.8
pycocotools                     2.0.4
pycparser                       2.21
pyct                            0.4.8
pydata-google-auth              1.3.0
pydot                           1.3.0
pydot-ng                        2.0.0
pydotplus                       2.0.2
PyDrive                         1.3.1
pyemd                           0.5.1
pyerfa                          2.0.0.1
pyglet                          1.5.0
Pygments                        2.6.1
pygobject                       3.26.1
pymc3                           3.11.4
PyMeeus                         0.5.11
pymongo                         3.12.3
pymystem3                       0.2.0
PyOpenGL                        3.1.6
pyparsing                       3.0.7
pyrsistent                      0.18.1
pysndfile                       1.3.8
PySocks                         1.7.1
pystan                          2.19.1.1
pytest                          3.6.4
python-apt                      0.0.0
python-chess                    0.23.11
python-dateutil                 2.8.2
python-louvain                  0.16
python-slugify                  6.1.1
python-utils                    3.1.0
pytz                            2018.9
pyviz-comms                     2.1.0
PyWavelets                      1.2.0
PyYAML                          3.13
pyzmq                           22.3.0
qdldl                           0.1.5.post0
qtconsole                       5.2.2
QtPy                            2.0.1
regex                           2019.12.20
requests                        2.27.1
requests-oauthlib               1.3.1
resampy                         0.2.2
rpy2                            3.4.5
rsa                             4.8
scikit-image                    0.18.3
scikit-learn                    1.0.2
scipy                           1.4.1
screen-resolution-extra         0.0.0
scs                             3.2.0
seaborn                         0.11.2
semver                          2.13.0
Send2Trash                      1.8.0
setuptools                      57.4.0
setuptools-git                  1.2
Shapely                         1.8.1.post1
simplegeneric                   0.8.1
six                             1.15.0
sklearn                         0.0
sklearn-pandas                  1.8.0
smart-open                      5.2.1
snowballstemmer                 2.2.0
sortedcontainers                2.4.0
SoundFile                       0.10.3.post1
spacy                           2.2.4
Sphinx                          1.8.6
sphinxcontrib-serializinghtml   1.1.5
sphinxcontrib-websupport        1.2.4
SQLAlchemy                      1.4.32
sqlparse                        0.4.2
srsly                           1.0.5
statsmodels                     0.10.2
sympy                           1.7.1
tables                          3.7.0
tabulate                        0.8.9
tblib                           1.7.0
tenacity                        8.0.1
tensorboard                     2.6.0
tensorboard-data-server         0.6.1
tensorboard-plugin-wit          1.8.1
tensorflow                      2.6.3
tensorflow-datasets             4.0.1
tensorflow-estimator            2.6.0
tensorflow-gcs-config           2.8.0
tensorflow-hub                  0.12.0
tensorflow-io-gcs-filesystem    0.24.0
tensorflow-metadata             1.2.0
tensorflow-probability          0.16.0
tensorflow-serving-api          2.6.3
tensorflow-transform            1.3.0
termcolor                       1.1.0
terminado                       0.13.3
testpath                        0.6.0
text-unidecode                  1.3
textblob                        0.15.3
tfx-bsl                         1.3.0
Theano-PyMC                     1.1.2
thinc                           7.4.0
threadpoolctl                   3.1.0
tifffile                        2021.11.2
tomli                           2.0.1
toolz                           0.11.2
torch                           1.10.0+cu111
torchaudio                      0.10.0+cu111
torchsummary                    1.5.1
torchtext                       0.11.0
torchvision                     0.11.1+cu111
tornado                         5.1.1
tqdm                            4.63.0
traitlets                       5.1.1
tweepy                          3.10.0
typeguard                       2.7.1
typing-extensions               3.7.4.3
tzlocal                         1.5.1
uritemplate                     3.0.1
urllib3                         1.24.3
vega-datasets                   0.9.0
wasabi                          0.9.0
wcwidth                         0.2.5
webencodings                    0.5.1
Werkzeug                        1.0.1
wheel                           0.37.1
widgetsnbextension              3.5.2
wordcloud                       1.5.0
wrapt                           1.12.1
xarray                          0.18.2
xgboost                         0.90
xkit                            0.0.0
xlrd                            1.1.0
xlwt                            1.3.0
yellowbrick                     1.4
zict                            2.1.0
zipp                            3.7.0

Thanks for the tip with keras, that got rid of this error. But another one appeared:

% python ./molecules/preprocess.py --work-dir=results
2022-03-15 19:24:03.968411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1917] Ignoring visible gpu device (device: 1, name: NVIDIA GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1) with core count: 5. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
2022-03-15 19:24:03.968712: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-15 19:24:04.557332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10414 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
2022-03-15 19:24:05.006045: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter.
2022-03-15 19:24:06.051394: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Traceback (most recent call last):
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 275, in encode
    feature_handler.encode_value(value)
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 154, in encode_value
    self._value.append(self._cast_fn(values))
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1004, in __index__
    return self._numpy().__index__()
TypeError: only integer scalar arrays can be converted to a scalar index

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 572, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/transforms/core.py", line 1562, in <lambda>
    wrapper = lambda x: [fn(x)]
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 277, in encode
    raise TypeError('%s while encoding feature "%s"' %
TypeError: only integer scalar arrays can be converted to a scalar index while encoding feature "TotalC"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./molecules/preprocess.py", line 220, in <module>
    preprocess_data = run(
  File "./molecules/preprocess.py", line 195, in run
    _ = (
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/pipeline.py", line 585, in __exit__
    self.result = self.run()
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/pipeline.py", line 564, in run
    return self.runner.run_pipeline(self, self._options)
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/direct/direct_runner.py", line 131, in run_pipeline
    return runner.run_pipeline(pipeline, options)
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 195, in run_pipeline
    self._latest_run_result = self.run_via_runner_api(
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 206, in run_via_runner_api
    return self.run_stages(stage_context, stages)
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 384, in run_stages
    stage_results = self._run_stage(
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 646, in _run_stage
    self._run_bundle(
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 769, in _run_bundle
    result, splits = bundle_manager.process_bundle(
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py", line 1080, in process_bundle
    result_future = self._worker_handler.control_conn.push(process_bundle_req)
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/portability/fn_api_runner/worker_handlers.py", line 378, in push
    response = self.worker.do_instruction(request)
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 601, in do_instruction
    return getattr(self, request_type)(
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 639, in process_bundle
    bundle_processor.process_bundle(instruction_id))
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/worker/bundle_processor.py", line 996, in process_bundle
    input_op_by_transform_id[element.transform_id].process_encoded(
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/runners/worker/bundle_processor.py", line 222, in process_encoded
    self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 351, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 353, in apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1299, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1395, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1299, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1395, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1299, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1395, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1299, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1395, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1299, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1395, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1299, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1395, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 215, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1299, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1395, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "apache_beam/runners/worker/operations.py", line 152, in apache_beam.runners.worker.operations.ConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 712, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 713, in apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 1234, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 1315, in apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 572, in apache_beam.runners.common.SimpleInvoker.invoke_process
  File "/home/user/venv/coursera/lib/python3.8/site-packages/apache_beam/transforms/core.py", line 1562, in <lambda>
    wrapper = lambda x: [fn(x)]
  File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 277, in encode
    raise TypeError('%s while encoding feature "%s"' %
TypeError: only integer scalar arrays can be converted to a scalar index while encoding feature "TotalC" [while running 'Feature scaling/AnalyzeDataset/InstanceDictToRecordBatch/EncodeInstanceDictsAsTfExample']

The requirements I have used are:

apache-beam==2.32.0
tensorflow-transform==1.3.0
tensorflow==2.6.3
tensorflow-serving-api==2.6.3
keras==2.6.0

//edit updated requirements to exact versions

Hi! I recommend using the tensorflow and numpy versions in the pip list generated by Balaji above. Hope it works!

I have used the exact versions described in Balajis pip output. However, the same error occurs.

File "/home/user/venv/coursera/lib/python3.8/site-packages/tensorflow_transform/coders/example_proto_coder.py", line 277, in encode
    raise TypeError('%s while encoding feature "%s"' %
TypeError: only integer scalar arrays can be converted to a scalar index while encoding feature "TotalC" [while running 'Feature scaling/AnalyzeDataset/InstanceDictToRecordBatch/EncodeInstanceDictsAsTfExample']

Hmmm… Since this seems to be a tensorflow transform issue, can you also check the versions of tensorflow-metadata, pyarrow, and tfx-bsl in the list above? Also not sure if you included the numpy version.

All versions are exactly as noted above:

tensorflow-metadata==1.2.0
pyarrow==2.0.0
tfx-bsl==1.3.0
numpy==1.19.5