Training strategy for Week 1 Jazz Improv

Do the first several y_i for each block of training data contain a “startup transient”? If so, could we get better results by using only, say, the last 25 input/output pairs from each 30 sample block for training?

During model training, (a_0, c_0) are reset to zero for the inputs to the first LSTM step for each block of 30 samples, correct?. If so, c_1 has no information from previous time, c_2 has information from only a single previous time step, c_3 from only two previous time steps, etc. If the music process has long term state, there’s no way for it to be reflected in the states of the first several cells in each training block.

Thanks, Marc

Did you find an answer to your question?

@TMosh , I did not find an answer. Any insights you could share would be appreciated … thanks!