Hi.
I am applying a pre-trained google model, TFViTModel to my image dataset. It is a binary classification problem e.g. is the image a cat or not?
here is a snapshot of my code:
# Flipping and rotating images
data_augmentation = keras.Sequential(
[layers.RandomFlip("horizontal"), layers.RandomRotation(0.1),]
)
# Apply data augmentation
inputs = keras.Input(shape = train_images.shape[1:])
x = data_augmentation(inputs)
# Importing the base model
base_model = TFViTModel.from_pretrained('google/vit-base-patch16-224-in21k')
# Defining the layers
inputs = keras.Input(shape = train_images.shape[1:])
x = data_augmentation(inputs)
inputs.shape # or x.shape
out: TensorShape([None, 224, 224, 3])
x = base_model(x, training=False)
outputs = tf.keras.layers.Dense(1, activation=‘sigmoid’)(x)
My error is
ValueError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_28616\1966532.py in <cell line: 2>()
1 # The model
----> 2 x = base_model(x, training=False)
3
ValueError: Input 0 of layer projection is incompatible with the layer: expected axis -1 of input shape to have value 3 but received input with shape (None, 224, 3, 224)
I have tried to force the shape for the Input layer like this:
inputs = keras.Input(shape = (3, 224, 224))
inputs.shape
out: TensorShape([None, 3, 224, 224])
and got another ValueError:
ValueError: Layer dense_3 expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor ‘Placeholder:0’ shape=(None, 197, 768) dtype=float32>, <tf.Tensor ‘Placeholder_1:0’ shape=(None, 768) dtype=float32>]
Can somebody give advice about how to reshape it to the form required?
Thank you.
P.S. I am following this guide, https://www.philschmid.de/image-classification-huggingface-transformers-keras