Week 4 Transformer Network: package versions

Hello All,

I just finished the courses. However, having finished, I have lost access to material.
I downloaded the exercises earlier. But running into issues when I
try to run C5_W4_A1_Transformer_Subclass_v1.ipynb locally on my Mac now.
These seem to be related to version of the packages installed. Especially of tensorFlow, Keras. Could be others.

Could someone tell me which versions are being used.

Anyone can get the list of package versions by

adding at the top:

import sys #<—
!{sys.executable} -m pip list #<—

import tensorflow as tf
import time
import numpy as np
import matplotlib.pyplot as plt

If someone currently in the course can provide the list, that would be very helpful for me run the material locally.

Thanks!

N

Here’s what I see when I run that in C5 W4 A1:

Package                 Version
----------------------- -------------------
absl-py                 0.15.0
alembic                 1.4.2
astor                   0.8.1
astunparse              1.6.3
async-generator         1.10
attrs                   19.3.0
backcall                0.1.0
beautifulsoup4          4.9.0
bleach                  3.1.4
blinker                 1.4
bokeh                   2.0.1
Bottleneck              1.3.2
brotlipy                0.7.0
cachetools              4.1.0
certifi                 2020.4.5.1
certipy                 0.1.3
cffi                    1.14.0
chardet                 3.0.4
click                   7.1.2
cloudpickle             1.4.1
conda                   4.8.2
conda-package-handling  1.6.0
cryptography            2.9.2
cycler                  0.10.0
Cython                  0.29.17
cytoolz                 0.10.1
dask                    2.15.0
decorator               4.4.2
defusedxml              0.6.0
dill                    0.3.1.1
distributed             2.15.2
entrypoints             0.3
fastcache               1.1.0
filelock                3.5.1
flatbuffers             1.12
fsspec                  0.7.3
gast                    0.3.3
gmpy2                   2.1.0b1
google-auth             1.14.1
google-auth-oauthlib    0.4.1
google-pasta            0.2.0
grpcio                  1.32.0
h5py                    2.10.0
HeapDict                1.0.1
idna                    2.9
imageio                 2.8.0
importlib-metadata      1.6.0
ipykernel               5.2.1
ipympl                  0.5.6
ipython                 7.14.0
ipython-genutils        0.2.0
ipywidgets              7.5.1
jedi                    0.17.0
Jinja2                  2.11.2
joblib                  0.14.1
json5                   0.9.0
jsonschema              3.2.0
jupyter-client          6.1.3
jupyter-core            4.6.3
jupyter-telemetry       0.0.5
jupyterhub              1.1.0
jupyterlab              2.1.1
jupyterlab-server       1.1.1
Keras-Applications      1.0.8
Keras-Preprocessing     1.1.2
kiwisolver              1.2.0
llvmlite                0.31.0
locket                  0.2.0
Mako                    1.1.0
Markdown                3.2.1
MarkupSafe              1.1.1
matplotlib              3.2.1
mistune                 0.8.4
mock                    4.0.2
mpmath                  1.1.0
msgpack                 1.0.0
nbconvert               5.6.1
nbformat                5.0.6
networkx                2.4
notebook                6.0.3
numba                   0.48.0
numexpr                 2.7.1
numpy                   1.19.2
oauthlib                3.0.1
olefile                 0.46
opt-einsum              3.3.0
packaging               20.1
pamela                  1.0.0
pandas                  1.0.3
pandocfilters           1.4.2
parso                   0.7.0
partd                   1.1.0
patsy                   0.5.1
pexpect                 4.8.0
pickleshare             0.7.5
Pillow                  7.1.2
pip                     22.0.3
prometheus-client       0.7.1
prompt-toolkit          3.0.5
protobuf                3.11.4
psutil                  5.7.0
ptyprocess              0.6.0
pyasn1                  0.4.8
pyasn1-modules          0.2.8
pycosat                 0.6.3
pycparser               2.20
pycurl                  7.43.0.5
Pygments                2.6.1
PyJWT                   1.7.1
pyOpenSSL               19.1.0
pyparsing               2.4.7
pyrsistent              0.16.0
PySocks                 1.7.1
python-dateutil         2.8.1
python-editor           1.0.4
python-json-logger      0.1.11
pytz                    2020.1
PyWavelets              1.1.1
PyYAML                  5.3.1
pyzmq                   19.0.0
regex                   2022.1.18
requests                2.23.0
requests-oauthlib       1.3.0
rsa                     4.0
ruamel.yaml             0.16.6
ruamel-yaml             0.15.80
ruamel.yaml.clib        0.2.0
sacremoses              0.0.47
scikit-image            0.16.2
scikit-learn            0.22.2.post1
scipy                   1.4.1
seaborn                 0.10.1
Send2Trash              1.5.0
setuptools              46.1.3.post20200325
six                     1.15.0
sortedcontainers        2.1.0
soupsieve               1.9.4
SQLAlchemy              1.3.16
statsmodels             0.11.1
sympy                   1.5.1
tables                  3.6.1
tblib                   1.6.0
tensorboard             2.8.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit  1.8.1
tensorflow              2.4.0
tensorflow-estimator    2.4.0
termcolor               1.1.0
terminado               0.8.3
testpath                0.4.4
tokenizers              0.10.3
toolz                   0.10.0
tornado                 6.0.4
tqdm                    4.45.0
traitlets               4.3.3
transformers            4.3.2
typing-extensions       3.7.4
urllib3                 1.25.9
vincent                 0.4.4
wcwidth                 0.1.9
webencodings            0.5.1
Werkzeug                1.0.1
wheel                   0.37.1
widgetsnbextension      3.5.1
wrapt                   1.12.1
xlrd                    1.2.0
zict                    2.0.0
zipp                    3.1.0
WARNING: You are using pip version 22.0.3; however, version 24.0 is available.
You should consider upgrading via the '/opt/conda/bin/python -m pip install --upgrade pip' command.

Hope no execution exceptions when you run! That was fast. I spent almost a week trying to debug. Thanks so much, appreciate it!

It may be a bit of a hassle to use the info in that form. I also did this:

!pip freeze > requirements-pip_freeze.txt

And the resulting file is attached (note that I had to add the fake “dot py” extension in order to get it to upload).
requirements-pip_freeze.txt.py (3.0 KB)

Great, thanks! I don’t know how exactly or if I can directly make use of it to create the environment(with maybe Conda) – will figure it out. Seems most packages are older than I have which is an issue.

Actually now that I see the contents of the file, it is indeed a better format for fixing my env! Thanks!

Here’s a thread which covers a lot of those and related questions.

Thanks so much!

In case you prefer to use conda instead of pip, since conda will let you maintain multiple environments in parallel, here is the result of the other command shown on that thread that I linked above:

!conda list --export > requirements-conda_list.txt

That yields this file (renamed with the same fake dot py extension):
requirements-conda_list.txt.py (5.8 KB)

Just a few more general things to say here:

Yes, the assignments use old versions of things. Most of the assignments here were last updated in a major way in April 2021, when they did the conversion from TF 1.x (no eager execution) to TF 2.x. So each assignment uses whatever versions were current at the time it was last published. This can be a problem, because things in this general space (python libraries and ML related packages like TF) evolve quickly and (unfortunately) do not always do so in backwards compatible ways.

The other thing to note here is that there is no guarantee that the versions shown above for C5 W4 A1 apply to all assignments in DLS. There are assignments in DLS C4 that use TF 2.3.0 and some that use TF 2.9.1. The one here uses TF 2.4.0. So this is why you might want to use conda, so that you can easily support multiple different sets of versions of everything.

Just realized this. I have not yet gotten it to work as it did when I submitted, all tests passing. Currently I have it ‘working’ in the latest env. The system goes through the ‘complete motions’, i.e., without syntactic errors but outputs ‘fail’, the numerical comparisons yield failure. Not sure if this is because the later versions have more/less/different precision, slight variations in internal algorithms or if the failures are real. For now, I have left it in this state. At some point hoping to revisit. Didn’t realize it could take up so much time(more than a week) to get to this stage.

But thanks for all the help!

If you are taking the approach of modifying things to run with the current versions of the packages, then you may well get different numeric results. Note that with TF it’s not really possible to get identical results even when you set the random seeds. The issue is that the training is parallelized and that process is fundamentally non-deterministic. They may have changed the behavior of the parallelization logic in the later versions. There is a flag you can set to get deterministic results, but it basically disables most of the parallelization, so it slows everything down. Here’s a post from mentor Raymond that explains this point.

Or they could have just changed things in other more direct ways that change the resolution of the outputs. Of course even if it’s more numerically accurate, that could still be “different”.

2 Likes

Sorry to interrupt @paulinpaloalto .

In the assignment, you’ll notice 2 imports from the transformers library:

from transformers import DistilBertTokenizerFast #, TFDistilBertModel
from transformers import TFDistilBertForTokenClassification

They are dead code and can be safely removed (you don’t need the transformers library to do the assignment). Since tensorflow depends on numpy, installing tensorflow is sufficient.

1 Like