C4W4 Assignment - model architecture

Hello, I am working on defining model architecture. I am not really understand how to define it. I referenced some coding in class, but the assignment shows that it is uncompilable. I don’t understand what is happening. Can anyone tell me how I can think in correct way and how to solve it? thanks!

Hi @kaian0414
Just found your public post. Thanks for the notebook. Moving forward, please provide a link to your public post when reaching via direct message.

I’m unsure how you jumped to model creation when parse_data_from_file is incorrect.

Recommendations:

  1. Fix the parsing logic. Read the markdown to understand what times means.
  2. Fix details like specifying the optimizer and creating a proper architecture.

Hello @balaji.ambresh
I found out the misunderstanding of “times”, but for your recommendation 2, I am not understand how to specific them. I read your tagged post just now, but I am not really know what does it mean, and how to define the layers and units…

Currently, I have so many rows of data, and 2 features (“Date”,“Temp”). The pre-designed parameters in the coding are: SPLIT_TIME = 2500, WINDOW_SIZE = 64, BATCH_SIZE = 256, SHUFFLE_BUFFER_SIZE = 1000.


I think about the filters and kernel_size of architecture, I think both of them are 2. Is it correct? I got 2 features from the trained data, so filters=2, and the length of Conv1D is also 2. I referenced the definition online.

Definitions from keras.io:

  • filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).
  • kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D convolution window.

There are certain cells where you haven’t specified the missing code. For instance, in function create_model, the loss and optimizer and left as None.

A time series problem in the assignment is a regression problem. How about you do the ungraded labs and then get back to this assignment? The practice labs will give better insight into considerations for a neural network for time series domain.

1 Like

Yes. It’s because I faced issues on the create_uncompiled_model(), I cannot go further in that part

I am referencing some materials online, but I don’t really understand how they design the model architecture… And it also shows that there is some issues on dimension.

Please look at the lectures and ungraded labs before attempting the assignment and ask specific questions at exercise i.e. notebook cell level. It’s hard to summarize week 4 lectures in a response.

@balaji.ambresh
Sorry for bothering you again… I read everything again today, and I also tried the notebook. Here are my issues:

  1. Before adjusting the learning rate, I used lr=1e-6, then I found that 1e-3 seems better (from learning rate chart), so I used it for later training. However, 1e-3 cannot make the loss become lower, it keeps on loss: 10.3552. I found some tutorial from google, it say I need to adjust learning rate. Then, I tried some values, some show loss:nan and some let the loss unchanged. Only when I used back 1e-6, loss become decreasing. This part I don’t understand how to do it.
  2. Then, I also checked the MSE and MAE, the result “mse: 6.30, mae: 1.93 for forecast”. But when I submit it to coursera, it shows another number. Oh my god, what’s happened? The coding for evaluation is just used in the original notebook…

Above are the issues that I faced today. I will also inbox you my latest notebook. Can you help me on solving it? Thanks!!! It becomes more and more confused for me… Thanks!!


@balaji.ambresh
One more thing, there is an issues on saving model, you may find it in my notebook. Some online blog said that , it has no other side effect besides the warning message… So I think that is not the failure reason in my assignment.

The closest thing to reproducibility in tensorflow 2.7 is to set the seed before building the model. Here’s an example:

tf.random.set_seed(2022)
model = tf.keras.Sequential([...])
model.compile(...)
model.fit(...)

So, when the passing criteria is 6 for MSE on validation set, your NN has to have a much lower error to ensure that the randomness doesn’t hurt your model performance by much. It’s a good idea of use an adaptive optimizer like adam instead of tuning learning rate from scratch for SGD.

Here’s a suggestion based on your notebook. It’s valid to specify a list of metrics to your model.compile function like this to give you a better picture of model performance over time:

model.compile(loss=None,
		metrics=['mse', 'mae'],
		optimizer=None)

You can also max pooling 1d layer to better summarize the inputs to lstm layer. Consider using bidirectional lstm to see if performance improves over lstm layers. Other than output layer, keep number of units a power of 2.

Consider using a custom callback to ensure that when the model achieves the MSE / MAE on the training set, you stop training any further. See tf.keras.callbacks.Callback.

With these hints, you should see MSE ~ 5.35 and MAE ~ 1.80 in the validation dataset with less than 40 epochs of training. 100 gives a lot more room to better understand training data.