I’m using TensorFlow.js / Keras to train a model that would automatically solve Tetris in the browser.
Currently the model is not very good; what would be the next steps in learning how to improve it? – I’m learning both the theory and the TensorFlow / Keras APIs at the same time, so it is a little difficult.
My current idea is that I somehow need to introduce memory into the model, or parametrise the reward function; not sure how to do neither theoretically nor in practically.
opened 02:42PM - 12 Jan 23 UTC
## Problem
The model quickly (~ 100 games) learns not to be utterly rubbish, … but soon settles to a performance level when it sometimes clears a line. The model currently takes its cue from the Reward.get_reward() method, but using model.predict() is markedly worse than using the best-scored move straight from the reward function.
## Input, output, reward
The model input is the state of the board, the current's piece shape, and sundries, the input shape is a flat (1, 226) tensor. The output is the desired rotation and position of the piece (4 rotations x 10 column positions = 40 possible moves), a (1, 40)-shaped tensor. The reward function looks ahead and computes the state of the board after performing each of the 40 possible moves, but the model itself only has access to the current board state.
Working on the level of piece placement (drop piece in column 3 at rotation 2) should give better results than working on the level of individual piece movements (left, left, left, rotate, left, rotate).
## Potential paths to improvement
* Allow the model to maximise the reward across several moves -- add memory?
* Parametrise the Reward.get_reward() function with learnable TenserFlow parameters, so that the reward function itself improves in order to clear lines

https://github.com/rdancer/Tetris-2023/blob/f42b1328b704baeff7553f1c2ca687df9bf3c787/ml/model.py#L102-L115
https://github.com/rdancer/Tetris-2023/blob/f42b1328b704baeff7553f1c2ca687df9bf3c787/ml/reward.py#L115-L122
I am not very familiar either with this kind of problem but maybe you could have a look on Reinforcement Learning, it might be suitable for it.