On the ending of the course

I think this is an excellent course for those who want to develop applications that use neural networks meaningfully. However, I did not find hints on solving the problem of what data to put on the input level.
For example, for a neural network that evaluates a chess position, there can be at least 4 different approaches to this: 64 numbers or codes that describe the content of each of the cells of the chessboard; 32 numbers describing the position of chess pieces (or maybe 64 again, if we describe the position of each piece by vertical and horizontal lines, and not by the single cell number; 10 64-bit sets that give the placement of the same type of pieces (5 types, each from a pawn to a king, taking into account 2 colors) on a chessboard (this is the representation used by the leading chess programs to maximize the speed of enumeration of possible lines of moves); finally, just a variable length standard FEN string, which gives the generally accepted description of a chess position (however, also line-by-line for each of the horizontals, i.e., consisting of 8 parts). Before doing this by trial and error, I would like to hear some kind of “philosophy” about this.
Also, at the end of this course, I would like to try to work with the code in Notebook, as it was in the previous ones.

In terms of how you represent the data in your chess example, it doesn’t matter, as long as you are consistent. Meaning that there would be no difference in how the algorithm learns and the only difference might be in the amount of storage and memory space that is required. There’s an interesting example in DLS C1 to do with unrolling or “flattening” 2D RGB images into vectors in order to feed them to Logistic Regression or a Feed Forward Neural Network: there are two potential orders for the flattening which are “C” order and “F” order. It turns out the accuracy of the model is the same with either orientation of the data, as long as you are consistent in how you do it. Here’s a thread which discusses that, but note that you have to read all the way to the end to see the discussion of the points I mentioned above.

As to wanting to practice what you’ve learned, it just turns out this course is not a programming course. You can “hold that thought” on the things you learned here and watch for situations in which you can apply those ideas as we move on through Course 4 (ConvNets) and Course 5 (Sequence Models).

1 Like

Some thoughts.

Method 1 poses it as an image problem, and method 4 a text problem, where method 4 carries more information than method 1. Mastering the Game of Go without Human Knowledge described an information-richer version of method 1 on page 27.

I am not sure about Method 2A and 3. They don’t look trivial to me. For example, how will you deal with captured chesses? Also, how should we assign the value for a grid’s location (or bit-position)? I am asking how should we, not how could we. We could assign 0 to the upper left corner, than increase from left to right and from top to bottom, but why should that be optimal? This is for the consideration of how we assign feature values that’s not supposed to be one-dimensional and trivially sorted.

Method 2B looks better than 2A, and is neither an image nor a text problem.

1 Like