In the videos of Week 2 of Course 5, Andrew writes Softmax without the bias term. Does he assume that the feature vectors are appended with an extra feature that is always equal to 1?
In the lectures, Andrew is mostly using simple block diagrams. The bias units are handled automatically by TensorFlow.
Can you give a specific example where you have a question? Posting a screen capture image would be useful.
E.g. in the Word2Vec video:
I think he just left it out for simplicity of notation.
The softmax function itself assumes you’ve already computed the probabilities, and all you’re doing is scaling them so their sum is one.