C4W1_Assignment - Exercise 5

This line of code converts the text into a tensor. So, you should use it instead of tf.constant() (there are no tf.constant in the cell above Exercise 5), as per my previous instruction:

1 Like

Your ChatGPT must love you b/c it keeps giving me errors. :sob:

Nevertheless, thank you for this! I would have NEVER known to phrase my prompt this way. :+1:t5:

1 Like

No problem. My prompt is actually pretty weak, since I didn’t think for two seconds when creating it. Pondering longer on it would definitely give you better results.

If you want to learn more about it, you should check out this free course.

What error do you get?

2 Likes

Hello :wave:t5:

NameError Traceback (most recent call last)
Cell In[43], line 6
3 temp = 0.0
4 original_sentence = “I love languages”
----> 6 translation, logit, tokens = translate(trained_translator, original_sentence, temperature=temp)
8 print(f"Temperature: {temp}\n\nOriginal sentence: {original_sentence}\nTranslation: {translation}\nTranslation tokens:{tokens}\nLogit: {logit:.3f}")

Cell In[42], line 24, in translate(model, text, max_length, temperature)
20 original_sentence = “I love languages”
22 # Assuming trained_translator is your trained model
23 # Ensure that the translate function applies text_vectorizer internally if needed
—> 24 translation, logit, tokens = translate(trained_translator, text_vectorizer, original_sentence, 50, 0.0)
25 ### END CODE HERE ###
26
27 # Concatenate all tokens into a tensor
28 tokens = tf.concat(tokens, axis=-1)

NameError: name ‘text_vectorizer’ is not defined


Following your guidance, I’ve successfully applied the hint from the previous cell to convert the input text into a tensor and add a batch dimension with the following line of code:

texts = tf.convert_to_tensor(eng_sentence)[tf.newaxis]

This step helped me understand the importance of formatting the text input correctly for TensorFlow processing, ensuring it’s in a tensor format with the necessary batch dimension.

However, I’ve run into an issue where implementing the complete logic for the translate function and subsequently calling this function with the correct parameters seems to require modifications outside the designated START/END code comment tags. Specifically, the structure of my code includes defining the translate function and then calling it with the necessary arguments, like so:

def translate(model, text_vectorizer, text, max_length=50, temperature=0.0):
    # Function logic here
    pass  # Placeholder for implementation details

# Outside START/END tags
translation, logit, tokens = translate(trained_translator, text_vectorizer, original_sentence, 50, 0.0)

I understand the importance of adhering to the exercise’s constraints and guidelines, but I find myself uncertain about how to proceed without adjusting the code outside the specified comment tags to correctly define and utilize the translate function as intended.

Should modifications outside the START/END comment tags be considered acceptable for correctly implementing and demonstrating the functionality of the translate function, especially in light of the hint provided?

1 Like

I’m just curious, as I’m not a mentor for this course. Would using ChatGPT to write your code be allowed under the Code of Conduct for this course?

1 Like

Hi @TMosh

To be honest, I don’t know.

I don’t think this question is Course dependent (if ChatGPT is not allowed for DLS then it most probably not allowed on other Courses too).

My personal view is that it is a grey area. I see two aspects of this question - copyright and cheating.
As for copyright, in my view, ChatGPT is a tool (like Google Drive, GMail, VS Code etc.) and if you can save Assignment Notebooks in your gmail or gdrive, you should probably be ok with asking ChatGPT questions about your code (or get suggestions from Copilot in VS Code plugin, or using Copilot chat, etc.). In other words, it’s not sharing the material with public (it’s ok to have a copy on Google drive as long as it’s not visible to everyone on the internet), but with companies (or products that they build) and it’s probably ok.
I’m not a lawyer, maybe there are distinctions how the data is used (I rarely read or fully understand the Terms of Use etc.) and other things the lawyers might enlighten us about, so who knows :slight_smile:

As for cheating, in my personal view, it’s again a grey area. It could be compared to using a calculator, documentation or stackoverflow… Maybe tools like these are not limited, for example, they can help with completing the Assignment entirely, or maybe with just understanding the code, maybe in some other ways. In other words, it’s not so clear if or when ChatGPT is cheating.

What are your thoughts on that? Maybe @Mubsi or @Staff could clarify that for us?

1 Like

Hi @RyeToast

Take note what is being asked from you:

In other words, there is no ‘text_vectorizer’ but there is english_vectorizer defined for you.

Cheers

1 Like

Hi, @arvyzukai

My 6 attempts to resolve this within the allowed modification area have not been successful:

After following the provided instructions and your previous advice to use english_vectorizer instead of text_vectorizer, I successfully addressed the initial NameError. However, I’ve run into a ValueError related to processing the vectorized text through the LSTM layer of our model. The error message indicates that the LSTM layer cannot accept a RaggedTensor, which is the output format of our english_vectorizer when applied to the input text.

Given the constraints that we can only modify code within the ### START CODE HERE ### and ### END CODE HERE ### comments, I’m unsure how to proceed with converting the RaggedTensor to a regular tensor suitable for the LSTM layer. I understand that the LSTM expects a uniform tensor format.

How to handle the RaggedTensor produced by english_vectorizer in this context?

In addition, the start-of-sequence (SOS) token’s ID. I understand that this ID is crucial for the translation process, but I’m unsure of how to correctly identify or define it in our current setup. I’ve tried to find it in the text processing pipeline.

Here’s the relevant part of the code where I’m facing the issue:

# INITIAL STATE OF THE DECODER
# First token should be SOS token with shape (1,1)
# You need to define or get the sos_id from somewhere in your code
sos_id = ... # replace this with the correct value
next_token = tf.fill((1, 1), sos_id)

I think the SOS token’s ID might be obtained from the english_vectorizer or the TokenTextEncoder, but I’m not sure how to apply this in our current context.

How do I correctly identify the SOS token’s ID?

1 Like

If this assignment is like many others DLAI has created that use language models, then the SOS token is likely defined as a constant somewhere within the notebook.

1 Like

Hi, @TMosh
Cc: @arvyzukai

Firstly, I’ve successfully identified the sos_token within the notebook, which I understand is crucial for initializing the decoding process in our translation model. I found instances of ‘SOS’ and ‘sos_token’ mentioned throughout the notebook and have determined the correct usage of the sos_token to initiate the translation function.

However, while progressing with the implementation of the translate function, which utilizes the trained_translator model to translate a given sentence from English to Portuguese, I encountered a NameError indicating that the trained_translator variable is not defined. This occurs at the step where the function is called with a test sentence:

translation, logit, tokens = translate(trained_translator, original_sentence, temperature=temp)

Despite reviewing the notebook to ensure that all steps related to defining or loading the trained_translator model were correctly followed, I seem to be missing the specific instructions or steps necessary to either initialize or load the pre-trained model required for the translation function to execute properly.

In addition, we’re instructed to only modify the code between the START/END comment tags. However, I’m unsure about the exact changes I need to make within these tags. So far, I’ve replaced sos_token with sos_id to initialize the next_token variable, as shown below:

# First token should be SOS token with shape (1,1)
next_token = tf.constant(sos_id, shape=(1, 1), dtype=tf.int32)

However, I believe there should be more code within the START/END tags to complete the translation process, such as feeding the input text into the model, generating the next token, and repeating this process until the EOS token is generated or the maximum length is reached.

Also, when I tried to call the translate function, I received a NameError indicating that trained_translator is not defined. I understand that trained_translator should be the trained model used for translation, but I’m unsure about where and how to define and train this model in relation to the translate function.

How to complete the translate function and how to define and train the trained_translator model?

1 Like

Sorry, I can only offer basic advice (such as "Don’t modify anything outside of the “START CODE HERE” sections), because I’m not a mentor for this course, and I do not have access to the course materials.

1 Like

Hi @RyeToast

Once again, I would advise you to review my previous post on this thread.

Every single line you have to implement is compared with “See how it works by running the following cell:” cell which is given you as an example.

trained_translator is defined for you in previous cells (section “3. Training”), so you either forgot to run it (you need to run all previous cells), or deleted it, or something similar. In other words, it’s a global variable that was defined for you.

Again, look at the “See how it works by running the following cell:”, you did not implement it the way it was done in this cell.

In summary, review my previous post which explains how you should implement the translate function and how it compares to the “See how it works by running the following cell:”.

Cheers

1 Like

Hi, @arvyzukai

I’ve run into issues with the translate function, particularly around the use of english_vectorizer and handling input data formats for the LSTM layer within the model’s encoder.

Text Vectorization Issue: I understand that the english_vectorizer is used to convert raw English text into a format that our model can understand. However, I’m unsure about the exact process of applying this vectorizer to the input text within the translate function. Should I be applying the vectorizer to the entire input sentence at once, or should I be processing the sentence word by word?

LSTM Input Data Format: I’m also having trouble understanding the correct format for the input data to the LSTM layer within the model’s encoder. I know that LSTM layers expect input data in a 3D format, but I’m unsure how to reshape my vectorized text data to meet this requirement. Could you provide some guidance on this?

I’ve reviewed the course materials and example code thoroughly, but I’m still having trouble grasping these concepts.

Apologies for my ignorance.

1 Like

I’ve been using GitHub Copilot to assist with the coding process, but unfortunately, it hasn’t been able to provide a solution for the issues I’m encountering within the START/END comment tags either.

1 Like

The entire sentence. english_vectorizer handles the sentence word by word or more precisely token by token.

This was implemented by you previously in Exercise 1 - Encoder, which later you included in Exercise 4 into the Translator (model). So now, you just need to call it as model.encoder and it converts the tokenized sentence (the output of english_vectorizer) into the format the LSTM needs and also processes with the same LSTM you defined in the Exercise 1.

Does that make sense?

2 Likes

I recommend against this, because it doesn’t necessarily understand what you’re trying to do, and the Code of Conduct says you should only submit your own work.

1 Like

Shout out to @arvyzukai ! I would have NEVER figured this out without your hints! Thanks for the guidance and support throughout this challenging yet incredibly enlightening assignment. The journey through the neural translation model has been both intricate, fascinating, and :duck:ing frustrating.

Navigating through the nuances of text vectorization, encoder-decoder structures, and the LSTM layers presented a steep learning curve. With the examples you provided, I’ve gained a more profound understanding of how to preprocess input texts, manage the LSTM input data format, and iteratively generate translations.

Thank you, again!

2 Likes

I agree with @TMosh on this - preferably you should come up with solutions without any help and it should only be used as a last resort (after you tried multiple times, read through documentation, searched online, etc).

The goal of these Courses is learning and the learning process is important in achieving this goal.
In other words, if you just submitted all Assignment notebooks for ChatGPT to be completed for you and it did it for you perfectly, the “amount” of learning you would get from this process is surely way less than attempting to complete the Course yourself.

Cheers

1 Like

Hello @arvyzukai

I was partially passing translate grader cell as my translate for temperature 0 was not matching but passed the assignment.

I somehow remember the instruction tf.zeros but ignored it as I followed the previous cell and reading your comment confirmed about my mistake.

The detail response to the learner in the post, does help a lot.

Regards
DP

Hello @arvyzukai

I was partially passing translate grader cell as my translate for temperature 0 was not matching but passed the assignment.

I somehow remember the instruction tf.zeros but ignored it as I followed the previous cell and reading your comment confirmed about my mistake.

The detail response to the learner in the post, does help a lot. Thank you.

Regards
DP