Problems calculating transition matrix A

I​ am calculating the values for transmission matrix A. My values are orders of magnitude off.

I​ get:

A at row 0, col 0: 0.021739130 , A at row 3, col 1: 5021.7609

expected output:

A at ​row 0, col 0: 0.000007040, A ​at ​row3, col1: 0.1691

m​y equation is:

A[i,j] = (count + alpha) / (count_prev_tag + (alpha * num_tags))

All the code leading up to here, works. I​ am stuck. I not sure what I am doing wrong? It must be something obvious.

P.S - despite changes, my values don’t seem to change when I re-run the example

1 Like

Hi.

There is nothing wrong with your given equation. You must be calculating wrong one or more of the variables inside the equation. When you get stuck I would suggest to make use some `print()` statements (or `breakpoint()` if you know what you are doing) to track if you get the right values for them - in other words I would suggest to calculate some tags’ transitions manually and check where is the mistake.

For example, if you would use these `print()` statements:

``````print(f'i:{i}, j:{j}, count:{count}, alpha:{alpha}',)
print(f', count_prev_tag: {count_prev_tag}, num_tags: {num_tags}')
``````

you should get:

i:0, j:0, count:0, alpha:0.001
count_prev_tag: 142, num_tags: 46

to check manually you would calculate:

(0 + 0.001) / (142 + 0.001*46) = 0.000007040

Expected Output:
A at row 0, col 0: 0.000007040

If you are curious which tags’ value is this you could:
`trans_df.iloc[0:2, 0:2]`

and get:

``````| |# |\$|
|---|---|---|
|# |7.039973e-06 |7.039973e-06|
|\$ |1.356476e-07 |1.356476e-07|
``````

So for tag ‘#’ to transition to tag ‘#’ there is 0.000007040 chance. (P.S. This is somewhat bad example because this probability is exactly the same to most tags from ‘#’ and there is probability of 0.992643 transitioning to ‘CD’ tag but I think you get the idea).

1 Like

Arvyzukai, thanks! I found the problem. I was using the wrong key to get count_prev_tag! The defaultdict meant I got a zero so I didn’t think twice!

Cheers,
Andrew

Thank you for both the question and an answer, it helped a lot