Yes, the input requires one unit per feature.
Selecting the number of units per hidden layer is a matter of experimentation and compromise.
In general you want enough units to get sufficiently good results, but you want as few units (and layers) as possible to minimize the computational requirements during training.
In addition to @TMosh‘s reply: There are use cases where hyperparameter optimization frameworks like KerasTuner might be worth a look, see also:
Hope that helps, @MOHAMED_AZIZ2!
Best regards
Christian
