In the course lecture, the neural network should take state and action pair as input and the target variable should be the q-function calculated using the bellman equation. If that is the case, why in Lunar Lander Exercise are we initializing the network input with only the state and not state-action pair. I am a bit confused on this.