I was practicing with some kaggle projects in a jupyter notebook. I found the next issue:
everytime I try to continue my practice, I have to re-run all cells in the notebook. This may be annoying because some training/cross validation take a long time, and I have to wait a couple of hours to continue my practice. How professionals handle this? They re-run cells everyday? They actually work in jupyter notebooks?
Thanks in advance.
The problem is not your “progress” as in the code you’ve written, but the “runtime state” of the notebook. That is not saved: it only exists in memory while the notebook is running. Your only recourse not to lose that is to keep the notebook running forever. On most platforms, the notebook will timeout after a certain interval with no interactive activity. On Coursera, I think the timeout is on the order of 5 or 10 minutes. On Google Colab last time I checked (which was a while ago), I think it was more like 30 minutes. If you are running training that takes a long time (hours), then you need a way around this. Just running the notebook, does not count as “interacting” with it on Google Colab. There are some “hacks” you can do to run some kind of plugin in your browser to cause the notebook to think you are moving the mouse in the window automatically.
If you are running training that takes > single digit hours, then you need a better solution: you need to enhance your code to “checkpoint” your training state periodicly and then be able to restart from a checkpoint that has been saved in your Google Drive or whatever the appropriate storage mechanism is for the platform in question. You need some version of that logic anyway to save your learned model parameters so that you can use them for making predictions later. For Coursera, you can just save the state as a local file on the VM running in AWS. I suggest you do some googling and you should be able to find some information about how to do checkpoint and restart on whatever framework you are using (TensorFlow, PyTorch, …).
Of course the other general approach is to install the required packages on your local computer and then run python code directly in the python interpreter or PyCharm or your GUI of choice. Of course it may also be a matter of how big the problems are that you are working on. Training large models with large datasets takes serious hardware horsepower and elapsed time.
Hello @Levent, I would also create checkpoints after heavy computation steps, such as by saving the outputs of those steps to a pickled file, so that in the next time, I reload the outputs instead of computing the outputs.
Please google to see how you may save your outputs as a pickled file in Python. There are limitations but should work if those outputs are ordinary objects like an array or a dataframe.