What is the point of using Viterbi algorithm?

After we get the transition matrix A and emission matrix B.
We use test data(not training data used to generate A and B) to generate matrix C and D.
From matrix D we can get the accuracy.

What does the accuracy of get from Viterbi Backward stand for?

The accuracy of matrix A and B?

Hi @Jing_bot

There is no accuracy for A and B, it’s just the information we take from the training set - what are the probabilities for transition and emission. We are hoping that the probabilities are similar for our test set.

If you are asking what # UNQ_C8 compute_accuracy outputs then it’s the accuracy how well we predict tags based on test set (it’s ~95%).

What is the point of using Viterbi algorithm?

I’m not sure what exactly are you asking here, so I just elaborate:
Language is an inherently temporal phenomenon. Viterbi algorithm is a decoding algorithm (meaning, the algorithm to determine hidden variables (tags)) for Hidden Markov Model where we proceed one word at a time carrying forward information we have up until now (we “imagine” the hidden part - POS, calculate best probabilities and carry them over for the next step - dynamic programming)
Viterbi resembles the dynamic programming minimum edit distance algorithm from previous weeks. It’s a precursor for later weeks and language modelling with RNNs.

Hi arvyzukai

Thank you for your reply.

From my understanding that after we trained the model, we always use the model to test something unknown.

In this case, should I think that matrix A and B are the model we trained. Then we use Viterbi algorithm to test our model. Viterbi algorithm is just one of many ways to test the model.

In general, yes.

Well… not quite… A and B are part of the model. I’m not sure how to define the model here eloquently (it’s not a linear equation like in regression or neural network (or maybe it has but it’s not very straightforward), it’s a particular Viterbi algorithm’s instance). We “model” the world as if it were hidden states (tags) and they emit words. Having only A and B wouldn’t tell us what to do next. The model here are the rules by which we model the world, for example in this assignment we use unidirectional approach, it restricts us from looking into the future (we only consider what tag should be for this current word only looking at the previous state (and not more or the ones in the future)).

Not really… The Viterbi algorithm is the model. We use Viterbi algorithm to predict tags of our test set.

We test it by computing the accuracy of the tags we predicted (the ones we didn’t know) - the algorithm gets 95 correct out of a 100.

Thank you so much for answering my questions.

At this moment I finally realised that we do not use the tags of test_corpus when we generate matrix D. Only from the backward, we can know the predicted tags of test_corpus.

May I ask that what is the popular usage after we get the tags of the test_corpus?

Nowadays I think nobody use them (POS tags; Viterbi algorithm is still widely used in speech recognition, speech synthesis, diarization, keyword spotting, computational linguistics, and bioinformatics).
“Back in the day” I can imagine they might had uses for like spelling corrections / grammar, probably categorization, translation and more. To be honest, I don’t really know.