I am calculating the values for transmission matrix A. My values are orders of magnitude off.
I get:
A at row 0, col 0: 0.021739130 , A at row 3, col 1: 5021.7609
expected output:
A at row 0, col 0: 0.000007040, A at row3, col1: 0.1691
my equation is:
A[i,j] = (count + alpha) / (count_prev_tag + (alpha * num_tags))
All the code leading up to here, works. I am stuck. I not sure what I am doing wrong? It must be something obvious.
P.S - despite changes, my values don’t seem to change when I re-run the example
1 Like
Hi.
There is nothing wrong with your given equation. You must be calculating wrong one or more of the variables inside the equation. When you get stuck I would suggest to make use some print()
statements (or breakpoint()
if you know what you are doing) to track if you get the right values for them - in other words I would suggest to calculate some tags’ transitions manually and check where is the mistake.
For example, if you would use these print()
statements:
print(f'i:{i}, j:{j}, count:{count}, alpha:{alpha}',)
print(f', count_prev_tag: {count_prev_tag}, num_tags: {num_tags}')
you should get:
i:0, j:0, count:0, alpha:0.001
count_prev_tag: 142, num_tags: 46
to check manually you would calculate:
(0 + 0.001) / (142 + 0.001*46) = 0.000007040
Expected Output:
A at row 0, col 0: 0.000007040
If you are curious which tags’ value is this you could:
trans_df.iloc[0:2, 0:2]
and get:
| |# |$|
|---|---|---|
|# |7.039973e-06 |7.039973e-06|
|$ |1.356476e-07 |1.356476e-07|
So for tag ‘#’ to transition to tag ‘#’ there is 0.000007040 chance. (P.S. This is somewhat bad example because this probability is exactly the same to most tags from ‘#’ and there is probability of 0.992643 transitioning to ‘CD’ tag but I think you get the idea).
1 Like
Arvyzukai, thanks! I found the problem. I was using the wrong key to get count_prev_tag! The defaultdict meant I got a zero so I didn’t think twice!
Cheers,
Andrew
Thank you for both the question and an answer, it helped a lot 