Running notebooks locally - tests failing due to version diffs

mbjarland · November 24, 2023, 3:46pm

I have a quesion about running notebooks locally. This is specifically for Convolutional Neural Networks W2A1 assignment but is applicable to a few others of the assignments as well.

I know how to download all the relevant files etc and setup a virtual env locally. My problem is that if I create a virtual env with the latest 3.12.0 python version and the latest tensorflow version (2.15.0), those versions result in slightly different test results which in turn fails the tests for some blocks. This even though the notebook works verbatim without any changes in the online (python 3.8.10, tensorflow 2.9.1) environment (after uploading it from disk and overwriting the notebook file).

So my conclusion is that something changed with tensorflow or underlying packages in the versions between the two environments to cause a slight diff.

So how hard can that be (he thought), off to downgrade python and tensorflow. From what I can glean you need a python version no newer than 3.10 to run tensorflow 2.9.1. Installing an older python version into a virtual env is fairly trivial. However installing an older tensorflow version is where we run into issues:

(.venv) ╭─mbjarland@kirin ~/projects/deep-learning  ‹master*›
        ╰─➤ python --version                                                                                                                     Python 3.10.13

(.venv) ╭─mbjarland@kirin ~/projects/deep-learning  ‹master*›
        ╰─➤ pip install 'tensorflow==2.9.1'
ERROR: Could not find a version that satisfies the requirement tensorflow==2.9.1 (from versions: 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0, 2.13.1, 2.14.0rc0, 2.14.0rc1, 2.14.0, 2.14.1, 2.15.0rc0, 2.15.0rc1, 2.15.0)
ERROR: No matching distribution found for tensorflow==2.9.1

Umm…so I’m an experienced programmer but not that experienced with the python ecosystem of things. This is on a late model and fully updated macbook pro. Is this a case where tensorflow did not exist for my architecture (apple silicon) until 2.13 and I’m therefore hosed trying to match versions with the online notebooks?

I can of course ignore local dev, but also as a part of taking this course I want to teach myself the tools and mechanics of running things…so if at all possible I would have liked to get local envs working.

Any help much appreciated.

saifkhanengr · November 24, 2023, 4:04pm

Following this interesting post to see what other mentors have to share.

paulinpaloalto · November 24, 2023, 4:29pm

Exactly! Things evolve very quickly in this whole space and the assignments all use versions that were current in April 2021. A lot can change in 2.5 years. Of course one would think that things could change in a backward compatible way, but that doesn’t always happen. Actually you are fortunate that the difference is just in the accuracy of the results as opposed to an exception getting thrown because some API has been obsoleted. Believe it or not, that happens also.

Because this phenomenon is a common problem, there are tools to deal with it, e.g. Anaconda. There are no official instructions, since there are just too many combinations to deal with, but here is a thread with some good links to get you started down that road.

Mubsi · November 24, 2023, 7:02pm

Hi @mbjarland,

To add on what Paul said, if you notice, the assignment uses “random seeds”. These are used throughout the assignments to create consistent results. What I mean by that is, every time you’d run the assignment, you’d get those same values. This makes it easier to grade the exercises.

I’m not that familiar with random seeds, but what I have seen is, especially when I was updating this very assignment from an older tensorflow version to 2.9.1 is, even if I use the exact same package versions and try to run the assignment locally, I’d get different, but consistent, results because of how those same random seeds would behave in my local environment.

In fact, that’s exactly how I updated the assignments. I first updated and ran everything on my local, then adjusted the test outputs to what I got in the coursera environment.

My point being, my suspicion is that, if your code is correct then the change in the output you are getting is because of how those random seeds behave differently in your environment, especially since your TF version is different.

An easy way to verify your code is correct is to submit the assignment on coursera and see what the grader says. if you pass the grader then it means everything is fine and you can ignore the tests on your local machine.

Cheers,
Mubsi

paulinpaloalto · November 24, 2023, 11:53pm

Hi, Matias.

Sorry, I missed this subtlety in your message the first time around. Yes, I’m also running an M1 MacBook Pro and there are certain other compatibility issues that are problematic, e.g. VM software.

Given your programming experience, I’m sure it’s occurred to you already you have basically two options on this type of problem:

Duplicate the precise environment.
Debug and fix each incompatibility that you encounter.

Of course 2) is more work or at least work that there is no way to “scope” in advance: you just have to plow through it. It might take half an hour or it could take days. But also note that it is in some sense a less durable solution: TF may change again and break your code next week, so you incur (at least in principle) an ongoing cost with option 2) if your operational model is to stay current with TF and other packages.

But if option 1) is not available to you for some reason, as in the M1 Mac case, then you may not have the full suite of choices.

Well, there is a “hybrid” strategy: you could do option 2) just to get up to the point where things work with some version of TF that is available for your platform and then you could implement option 1) to preserve that set of versions. Of course that’s more costly in that you have to do the work to learn how to implement option 1), which is non-trivial. But then you should be trouble free from that point forward.

mbjarland · November 26, 2023, 3:16pm

As I’m not yet all that familiar with the python eco system of things, would anaconda actually solve ths problem somehow?

Also would it be possible to get a requirements.txt file (pip freeze > requirements.txt) for the online notebooks so we at least have a reference for what the correct environment looks like?

Or perhaps an environment.yml file if conda is the preferred tool.

paulinpaloalto · November 26, 2023, 3:47pm

Anaconda gives you the ability to create multiple specific environments in parallel. If you have multiple complex python based packages to run, each one may require a different combination of versions, which is the raison d’etre of Anaconda: it solves a real problem.

It’s not a complete solution for you, because you literally can’t use the exact versions from April 2021, because they don’t run on your platform. So you have to do the work to debug why things don’t work with whatever versions you can actually run. Then you can use Anaconda to preserve that environment as the larger python world continues to evolve around you and you may find other python based systems you want to use.

There are no officially released requirements files, but the thread I linked earlier shows how to get started on all this and derive the information you need.

rmwkwok · November 27, 2023, 1:59am

Hi @mbjarland,

I have not tried what you are trying to do, but for your consideration,

These stackoverflow answers are about how to get a list of packages that claimed to be used by a notebook, or a list of all installed packages. They should get you somewhere to start with.
If pip cannot find an older TF, then you may try to find a “wheel” for that version to install, or build it from the source (1-TF doc, 2-checkout the right branch). It would be good to first sort out a list of packages that pip can’t help, to see what monster you are facing, before going manual.

Raymond

ai_curious · November 27, 2023, 3:05am

In my experience getting deep learning class material to run on an older Mac OS back in 2017 then on an early M1 a couple of years ago, Anaconda was a rather blunt instrument. And a bloated one. It will bring along many hundreds of packages (plus their dependencies) that you don’t need for these exercises. I had better luck with something similar but with less ballast, conda and miniforge. When I started with the new M1 there was no official tensorflow distribution for it and it was painful to get one installed, but my understanding is that there is one now, though maybe a newer version than you prefer. What's new in TensorFlow 2.13 and Keras 2.13? — The TensorFlow Blog

As @paulinpaloalto has mentioned here and in other threads, people in this community and the broader interweb have traveled these paths before - no need to reinvent from first principles.

mbjarland · November 28, 2023, 12:51pm

I guess in my previous reply what I meant was “will anaconda add a solution to this problem that python -m venv <venv> (i.e. python virtual environments) does not already provide?”. I.e. will there be versions of packages on conda that are not available directly via pip?

Anyway, thank you for all your answers! It does not seem like there is an easy solution here. I will look into all the linked information (I think @rmwkwok’s suggestion to install by finding a wheel or potentially compiling from source looks promising) and see where I land.

If by chance I manage to solve this, perhaps it might be worth it to share a conda-pack asset somewhere so that others with apple silicon machines taking this course might have an easier path to a working environment.

Will post any progress here for posterity.

mbjarland · December 5, 2023, 11:16am

Ok…so a fairly deep rabbit hole later…

So I thought about this problem a bit and came to the conclusion that building a docker image where python, tensorflow, etc versions match the online deep learning notebook environment would be beneficial. It would be easy to share and once it works it works.

So we can print some of the package versions by modifying the code in one of the online notebooks, something like this:

import sys  # Import the sys module to access system-specific parameters and functions
import six 
import h5py
import packaging
import opt_einsum
import keras
import matplotlib
import scipy
#import scikit_learn
import jupyter


print("python       " + str(sys.version))
print("tensorflow   " + str(tf. __version__))
print("numpy        " + np.__version__)
print("six          " + six.__version__)
print("h5py         " + h5py.__version__)
print("packaging    " + packaging.__version__)
print("opt_einsum   " + opt_einsum.__version__)
print("keras        " + keras.__version__)
print("matplotlib   " + matplotlib.__version__)
print("scipy        " + scipy.__version__)
print("jupyter      " + jupyter.__version__)

Note I say some as this list is not the entire transitive dependency graph of the dependencies for the online notebook. I wish somebody with access to the environment hosting could provide us with the actual dependency tree, but more on this later.

This block run on the CNN W2A1 notebook gives me:

python       3.8.10 (default, Mar 15 2022, 12:22:08) 
[GCC 9.4.0]
tensorflow   2.9.1
numpy        1.22.3
six          1.15.0
h5py         3.6.0
packaging    20.9
opt_einsum   v3.3.0
keras        2.9.0
matplotlib   3.5.2
scipy        1.7.3
jupyter      1.0.0

ok, so that’s a start.

As mentioned above, one problem on the M1/Mx apple silicon macs is that tensorflow 2.9.0 does not exist in any installable form in the pip/etc repositories. So off we go compiling the thing from source.

For the uninitiated, compiling tensorflow from source is not for the faint of heart. On my maxed out M1 Max macbook it’s about a one hour process with a plethora of possibilities for getting things wrong and breaking the build somewhere half way through…just to restart the whole one-hour process. Oh what fun!

Anyway, so I created a Dockerfile which installs all the build prerequisites, clones the tensorflow source, makes sure to install the correct versions of all the above mentioned python packages so that we compile tensorflow against the right stuff and off we go.

After various convolutions (ha) I got the docker build to complete and to also contain all the right packages and the jupyter python package so that the docker container could start a jupyter server.

The idea here is that I can start the docker container on my machine, it will have all the “right stuff” ™ and will also run a jupyter notebook server. I can then sit in my local VSCode environment and connect to the container jupyter kernel via a port and for VSCode things should look like I was developing locally…only now I should have the right versions of everything and more or less exactly replicate the online jupyter environment.

I mean…how hard can it be?

Well turns out this seems somewhat non-trivial. I did all the above. I have the container running. When I connect VSCode to that jupyter kernel (via an exposed port from the container) I can execute code in the notebook and my version printing block from above exactly matches the versions from the online environment.

It turns out however that it was now boot-in-face o’clock. Even after all the above, with the correct tensorflow version, the correct numpy version, and correct versions of all the packages listed in the above list, the results still do not match the online environment and the tests keep failing.

Output in VSCode from my code block:

my results: https://photos.app.goo.gl/ZqaVECr1xVhrD2DQ7
exception: https://photos.app.goo.gl/ZiFnRcQmK3LHzswv8
expected results: https://photos.app.goo.gl/o9s2vubaDWfC2Yd28

If you don’t feel like staring at images, the first non-zero value in the Training=True matrix should be 0.40732 according to the tests, but is instead 0.40733 (with a 3 at the end) which seems to break the tests.

My current suspicion is that I’ve missed some crucial package which is responsible for the discrepancy. Could also be that the computer architecture actually affects the calculations somehow.

I would love for somebody with some deeper understanding to chime in and perhaps point me towards a likely candidate. Adding another package with a specific version at this point is not a whole lot of work. I just need to modify the dockerfile and re-run the build which probably at this point will no longer fail.

The upside here is that if I could get this to run I can share the docker image and write up a page on how to use it. This would mean that anybody with a apple silicon mac could run their notebooks locally with a couple of lines of code.

Anyway, any help much appreciated, I am at the moment out of ideas on how to proceed from here.

rmwkwok · December 5, 2023, 12:13pm

Hi @mbjarland,

I am not sure if I can help, but it is interesting, so I want to see if I can figure something out. Would you please share that docker image with me and some tips/links on how to start it? I have a Windows Subsystem for Linux (WSL) environment with ubuntu 22, and obviously a Windows environment (Win 11), so I can work from either of them.

I probably will first see if the image and the lab env can generate the same random number with np.random.... and tf.random.... and random....

Also, exactly which lab are you trying to replicate the result? (Convolutional Neural Networks W2A1 assignment?)

Btw, what is the base OS that you use? Is it what the lab env is using? The lab is using:

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal

Raymond

mbjarland · December 5, 2023, 12:40pm

Hi rmwkwok,

not sure how running this on an x86 architecture would affect things but happy to give it a try.

So assuming you have a command line docker client installed you should be able to pull the image with:

docker pull mbjarland/coursera-dl-specialization-apple-silicon

and run it with something like:

docker run --rm -it -p 8888:8888 -v .:/coursera mbjarland/coursera-dl-specialization-apple-silicon bash

where:

the -v maps your current directory to a /coursera directory in the container, so preferably you would move to the directory with the notebook and relevant data first and then run the above command.
the -p 8888:8888 maps the default jupyter port to your local machine.
-it and bash starts bash and leaves you with an interactive shell

The shell part is mostly there since I haven’t finalized this image yet. If and when I get it to work I’ll probably make it autostart the jupyter environment and you will no longer need an interactive shell or etc.

Anyway, once you are in the interactive shell with the docker container, you can start jupyter. The whole thing might look something like this:

╭─mbjarland@kirin ~/projects/deep-learning/04-convolutional-neural-networks/assignments/W2A1  ‹master*›
╰─➤ docker run --rm -it -p 8888:8888 -v .:/coursera mbjarland/coursera-dl-specialization-apple-silicon bash

root@5f1726ff1027:~# cd /coursera

root@5f1726ff1027:/coursera# python -m jupyter notebook --allow-root --no-browser --NotebookApp.allow_origin='*' --ServerApp.ip='0.0.0.0'
[I 2023-12-05 12:33:58.728 ServerApp] Package notebook took 0.0000s to import
[I 2023-12-05 12:33:58.733 ServerApp] Package jupyter_lsp took 0.0048s to import
...
[I 2023-12-05 12:33:59.292 ServerApp] Jupyter Server 2.11.2 is running at:
[I 2023-12-05 12:33:59.292 ServerApp] http://5f1726ff1027:8888/tree?token=4d4d72998f83f9b42dabc5b24500cef56a5ea59f7ce7b131
[I 2023-12-05 12:33:59.292 ServerApp]     http://127.0.0.1:8888/tree?token=4d4d72998f83f9b42dabc5b24500cef56a5ea59f7ce7b131

At which point you have a jupyter server running with the appropriate packages in the container.

Make note of the url with the token=... part as you need that to connect VSCode or whatever you use to the running jupyter server.

In VSCode you can do command palette search for “Notebook: Select Notebook Kernel” > “Select Another Kernel…” > “Existing Jupyter Server” > “Enter the URL of the running jupyter server” .

Enter the url from the server output, http://127.0.0.1:8888/tree?token=4d4d72998f83f9b42dabc5b24500cef56a5ea59f7ce7b131 in my case. After this it asks you to give the kernel a display name, you could use something like coursera-local and then you need to select which kernel within that process to use…should only be one choice.

At this point your VSCode should be connected to the running jupyter kernel in the container and you should be able to run things as per normal.

Hope that helps.

rmwkwok · December 5, 2023, 12:40pm

Actually, I just tried to generate some random numbers with numpy/tensorflow/random on my ubuntu 22.04.2 and another computer on ubuntu 20.04.3. They have the same generated numbers as the lab, even though their numpy/tensorflow/python versions are not all the same. Pretty good signal.

mbjarland · December 5, 2023, 12:41pm

As mentioned earlier in this thread I’m using apple silicon M1Max mac:

╭─mbjarland@kirin ~/projects/deep-learning  ‹master*›
╰─➤ screenfetch

                 -/+:.          mbjarland@kirin
                :++++.          OS: 64bit Mac OS X 	13.5.2 	22G91
               /+++/.           Kernel: arm64 Darwin 22.6.0
       .:-::- .+/:-``.::-       Uptime: 12d 49m
    .:/++++++/::::/++++++/:`    Packages: 377
  .:///////////////////////:`   Shell: zsh 5.9
  ////////////////////////`     Resolution: 3456x2234
 -+++++++++++++++++++++++`      DE: Aqua
 /++++++++++++++++++++++/       WM: Quartz Compositor
 /sssssssssssssssssssssss.      WM Theme: Purple (Dark)
 :ssssssssssssssssssssssss-     Disk: 10G / 4.0T (1%)
  osssssssssssssssssssssssso/`  CPU: Apple M1 Max
  `syyyyyyyyyyyyyyyyyyyyyyyy+`  GPU: Apple M1 Max
   `ossssssssssssssssssssss/    RAM: 7713MiB / 65536MiB
     :ooooooooooooooooooo+.
      `:+oo+/:-..-:/+o+/-

rmwkwok · December 5, 2023, 12:42pm

Let me give it a try first.

mbjarland · December 5, 2023, 12:46pm

You can of course also connect a browser to:

http://127.0.0.1:8888/

on your host machine after starting the jupyter server. Should give you a jupyter environment and ask for the token which can be found in the server output. Once you provide the token you should be able to open the notebook in your browser just like for the online version.

mbjarland · December 5, 2023, 12:48pm

As for the assignment, this is for the W2A1 residual networks assignment for the Convolutional Neural Networks course. I’ve had similar issues with other assignments though so I do not think this is isolated to this one assignment.

rmwkwok · December 5, 2023, 12:50pm

Thanks. I am pulling and extracting the image. Hope my machine can start it without problem. See how it goes with W2A1 first…

mbjarland · December 5, 2023, 12:52pm

Thanks for the check. Yeah, not sure how a different host architecture will affect this. I suspect docker will run this via emulation on your machine somehow but my docker skills end somewhere around there.

Topic		Replies	Views
Course 4, Week 1, programming assignment 2: cannot replicate results locally Convolutional Neural Networks coursera-platform	8	537	June 7, 2022
Python version local install? Neural Networks and Deep Learning coursera-platform	2	613	August 23, 2022
I cant run the code locally Convolutional Neural Networks coursera-platform	2	564	June 19, 2022
What is the tensorflow version and keras version needed in our hw? Convolutional Neural Networks week-module-2 , coursera-platform	6	380	March 30, 2024
Problems Importing TensorFlow Package (on my Local install) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	530	February 3, 2023

Running notebooks locally - tests failing due to version diffs

Related topics