Question about C7

I was able to write the sampling_decode function but when I run “sampling_decode(“I love languages.”, NMTAttn=model, temperature=0.7, vocab_file=VOCAB_FILE, vocab_dir=VOCAB_DIR)” I encounter the following error which appears to be linked to the vocab directories?

TypeError Traceback (most recent call last)
in
1 # Test the function above. Try varying the temperature setting with values from 0 to 1.
2 # Run it several times with each setting and see how often the output changes.
----> 3 sampling_decode(“I love languages.”, NMTAttn=model, temperature=0.7, vocab_file=VOCAB_FILE, vocab_dir=VOCAB_DIR)

in sampling_decode(input_sentence, NMTAttn, temperature, vocab_file, vocab_dir, next_symbol, tokenize, detokenize)
44
45 # detokenize the output tokens
—> 46 sentence = detokenize(cur_output_tokens)
47
48 ### END CODE HERE ###

in detokenize(integers, vocab_file, vocab_dir)
52 integers = integers[:integers.index(EOS)]
53
—> 54 return trax.data.detokenize(integers, vocab_file=vocab_file, vocab_dir=vocab_dir)

/opt/conda/lib/python3.7/site-packages/trax/data/tf_inputs.py in detokenize(x, vocab_type, vocab_file, vocab_dir, n_reserved_ids)
485 A string corresponding to the de-tokenized version of x.
486 “”"
→ 487 vocab = _get_vocab(vocab_type, vocab_file, vocab_dir)
488 x_unreserved = np.array(x) - n_reserved_ids
489 return str(vocab.decode(x_unreserved.tolist()))

/opt/conda/lib/python3.7/site-packages/trax/data/tf_inputs.py in _get_vocab(vocab_type, vocab_file, vocab_dir, extra_ids)
578
579 vocab_dir = vocab_dir or ‘gs://trax-ml/vocabs/’
→ 580 path = os.path.join(vocab_dir, vocab_file)
581
582 if vocab_type == ‘subword’:

/opt/conda/lib/python3.7/posixpath.py in join(a, *p)
92 path += sep + b
93 except (TypeError, AttributeError, BytesWarning):
—> 94 genericpath._check_arg_types(‘join’, a, *p)
95 raise
96 return path

/opt/conda/lib/python3.7/genericpath.py in _check_arg_types(funcname, *args)
151 else:
152 raise TypeError('%s() argument must be str or bytes, not r'
→ 153 (funcname, s.class.name)) from None
154 if hasstr and hasbytes:
155 raise TypeError(“Can’t mix strings and bytes in path components”) from None

TypeError: join() argument must be str or bytes, not ‘NoneType’

My C6 passed with no problem, but for C7, it said both test failed with no hint of where went wrong. I wonder if the error lies in C6, as when I vary the line in next_symbol() log_probs = output[0,token_length,:] or log_probs = output[0,token_length+1,:], ether one passed C6 test, which is hard to imagine.

w1_unittest.test_sampling_decode(sampling_decode)

Test 1 failed
Test 2 failed
0 Tests passed
2 Tests failed

1 Like

Resolved. I need to specify vocab file and dir again in detokenize()

1 Like