After doing the LTSM programming, I am feeling much better about the MECHANICS of this recurrent LTSM model. As with all these assignments, I was impressed with the detail and thoroughness.
I am still struggling, however, to get an strong intuition how it solves problems such as below,
which is commonly referred to as a motivation for this.
“As the CAT ran across the road, and avoided the car traffic, IT had to also deal with an oncoming bike.”
In this sentence you need to “match” CAT and IT. My intuition, so far, is that if you feed enough of these sentence, it will figure out how to use the memory and various gate feature to figure this out.
Any more like magic than the fact that a simple feed forward network can recognize whether there is a cat in an image with no more to go on than the labels and a collection of pixels?
But seriously, it is the same fundamental thing going on here: they have defined a structure that encodes relationships between earlier and later elements of a syntax separated by an arbitrary distance. And they have a metric for deciding whether it succeeded or not. With those two things and an appropriate amount of training data, the algorithms can learn to do it.
Mind you, I totally agree that it seems like magic at some level. But it’s apparently magic that actually works!