Hi, I am currently in the deep learning specialisation and just finished in parallel week 1 of course 5 and and I am appraoching the end of course. However one question is really going around in my head: If it would be possible for one of our usecases (available data) to to utilize a RNN (LSTM or GRU Units).
The use-case
We have collected panel data with lets say T_x = 8 waves (i.e., sequences, e.g. from T_1 to T_8). With some questionnaires assessed at every timestep (e.g. our outcome y, let’s assume it could be a categorical variable, either 1 = ‘clinical relevant symptoms’ and 0 = ‘not clinically relevant’ as measured by the cut-off for the Depression Sumscale BDI), and some shared predictors between all sequences (e.g., age, gender, daily activiy). This, as far as my understandings goes, could work fine with a many-to-many architecture
where T_x = T_y. However the number of features (n_x) is different for every wave (timestep or sequence). So e.g., n_x^{<1>} = 43 but in wave five there could be more available features (n_x^{<5>} = 71) and in wave 7 less n_x^{<7>} = 38
Open Question
Can RNNs (either RNN traditional Units, GRU, or LSTM) handle different numbers of features in the input of x^{<t>}? E.g. a questionnaire on quality of relationships is assessed at timepoits x^{<1>} and x^{<2>}, but was dismissed for all further waves. Or a newly developed questionnaire on fear related to Cov-19 was introduced in x^{<8>}? So my question is, can RNN handle such form of input data? And beyond that (I hope this might be covered in Course 2 or 3), how would I wrap my data (currently in .csv format: with one file for each wave (so 8 .csv s), with tabular data, i.e., features in columns and n=1..m cases in rows) in a numpy array? I know how to easly read in all data with e.g. a list comprehension. However as the n_x differs for each wave, I am unsure how to stack up the third dimension.