To be precise, your question cannot be answered with just the model you provided, because it depends on input shape. @Juan_Olano showed you the second dimension of the weights and biases, which depends on the the number of connections you specified in the Dense layers. But the first dimension depends on the input dimensions. @Juan_Olano also correctly points out that each layer has its own weights and bias objects, which it seems may be a source of confusion for you.
To help you explore further, I cobbled together some sample code I pulled from the interweb that actually loads and trains on the MNIST data. Maybe play around with it a bit and see if it doesn’t help?
import tensorflow as tf
import numpy as np
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
input_shape = (28, 28, 1)
x_train=x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
x_train=x_train / 255.0
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)
x_test=x_test/255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(name='FlattenLayer',input_shape=(28, 28)),
tf.keras.layers.Dense(128, name='Dense128',activation='relu'),
tf.keras.layers.Dense(10,name='Dense10')
])
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
history = model.fit(
x_train,
y_train,
batch_size=32,
epochs=6,
)
print(x_train.shape) #training inputs
(6000, 28, 28, 1)
print(x_train.shape[1]*x_train.shape[2]) #one flattened training input
784
layer = model.get_layer('Dense128')
print(layer.weights)
[<tf.Variable ‘Dense128/kernel:0’ shape=(784, 128) dtype=float32, numpy=
array(
[[ 0.06696945, 0.07679319, 0.06742486, …, -0.01404408,
0.07030296, -0.06523466],
[-0.07856862, -0.02751825, 0.01609654, …, 0.05214984,
0.07474259, -0.00408177],
[ 0.060228 , 0.00990647, -0.0353794 , …, -0.01840374,
0.06142697, 0.01504412],
…,
[-0.07450459, -0.05087827, 0.03410669, …, 0.04347721,
0.0032033 , 0.06776185],
[-0.04982505, -0.01608631, 0.07413752, …, 0.02166978,
-0.02083131, -0.05235609],
[-0.01907828, 0.05915994, 0.07626631, …, 0.07750984,
0.03713375, -0.06325782]],
dtype=float32)>,
<tf.Variable ‘Dense128/bias:0’ shape=(128,) dtype=float32, numpy=
array([ 0.09507202, 0.0825828 , 0.06340718, 0.01609202, 0.21900468,
-0.06261531, -0.01747248, -0.01654577, -0.1401379 , -0.17549014,
-0.01728175, 0.07885362, 0.19574201, -0.07354902, 0.13560148,
-0.08775545, -0.01657506, -0.04262279, -0.1097747 , 0.0043903 ,
0.10178334, 0.08393242, -0.0524591 , 0.15432979, 0.24308488,
-0.02623043, 0.0037841 , 0.00517588, -0.00893999, 0.04289384,
…
-0.05562854, 0.03767706, 0.06388604, -0.09254396, 0.06858032,
0.14181091, 0.08502957, -0.02626839, -0.01771167, -0.08993658,
0.02208419, 0.11501006, 0.07171755], dtype=float32)>]
Notice the shape of the W in the first Dense layer is (784,128), and the shape of the b in the first Dense layer is (128,) (1D)
layer = model.get_layer('Dense10')
print(layer.weights)
[<tf.Variable ‘Dense10/kernel:0’ shape=(128, 10) dtype=float32, numpy=
array([[ 0.0404684 , -0.0282199 , 0.08507891, …, -0.03951922,
-0.195442 , -0.45420387],
[ 0.04301757, -0.78330183, -0.30019557, …, -0.21572816,
0.2496909 , -0.05051493],
[ 0.05784213, -0.3508402 , -0.12698603, …, -0.63199383,
0.09412976, -0.1580189 ],
…,
[-0.31220144, 0.25271496, 0.16391952, …, 0.16664003,
-0.18537991, -0.21527106],
[-0.16806315, -0.12554409, -0.01407977, …, -0.4826628 ,
0.14121683, -0.14735046],
[ 0.1426654 , -0.12658076, -0.7396209 , …, -0.09768049,
0.15803745, 0.12935531]], dtype=float32)>,
<tf.Variable ‘Dense10/bias:0’ shape=(10,) dtype=float32, numpy=
array([-0.15701889, -0.08623032, 0.0506925 , -0.06654423, 0.03800333,
0.10811225, -0.04681133, -0.12554045, 0.21290296, -0.04749312],
dtype=float32)>]
In the last layer the shape of W is (128,10) while the shape of b is (10,) (again, 1D).
The shapes of W and b for the successive layers have to flow together from the input shape through to the output shape. This is because we have defined each step as matrix multiplication, where the input and the first hidden layer require coherent shapes, as do the output of the first hidden layer and the input of the second hidden layer, and the output of the last layer with the desired overall output. (60000x784)=>(784x128)=>(128x10)=>(10x1)
Notice that I printed out W and b from each layer only after the training had completed. But with some elbow grease, you could in fact collect those values during each training iteration and watch them evolve. Would be a good TensorFlow brain teaser
Let us know?