Here are a few hints:
- You should try different model architectures starting with a much smaller model.
- The kernel size of 10 is too high for a conv filter. A much more reasonable size for a kernel is 2 or 3.
- Increase the number of filters per conv layer gradually with depth.
- The number of units in a dense layer / the number of nodes in a dense layer are usually powers of 2 (a heurestic that can be observed in many models).
- Choice of optimizer is also important. In your case, the learning rate of 1e-4 is a bit too small. The network has to be trained a lot longer to achieve good performance. I recommend trying out optimizers with default learning rates (try
adam
).