Hi, in week 3, we’re building an NER system using a single LSTM layer. Why are we using a single LSTM unit? If there is only a single LSTM unit, doesn’t that mean there is nothing being remembered? For a system to take into account previous inputs, should there be at least 2 LSTM units, with the first unit feeding into the second?
I’m just wondering how the NER system works if there’s really no memory.