I tried running the Lesson 2 codes from my Mac laptop but I got blocked as I don’t have Prediction Guard API key.
As a workaround, I used the Hugging Face transformer BridgeTowerProcessor and BridgeTowerModel.
I refactored the bt_embedding_from_prediction_guard in utils.py
as below:
def bt_embedding_from_prediction_guard(prompt, base64_image):
# # get PredictionGuard client
# client = _getPredictionGuardClient()
# message = {"text": prompt,}
# if base64_image is not None and base64_image != "":
# if not isBase64(base64_image):
# raise TypeError("image input must be in base64 encoding!")
# message['image'] = base64_image
# response = client.embeddings.create(
# model="bridgetower-large-itm-mlm-itc",
# input=[message]
# )
# return response['data'][0]['embedding']
processor = BridgeTowerProcessor.from_pretrained("BridgeTower/bridgetower-large-itm-mlm-itc")
model = BridgeTowerModel.from_pretrained("BridgeTower/bridgetower-large-itm-mlm-itc")
inputs = {"text": prompt}
if base64_image:
inputs["images"] = base64_image
# Preprocess the inputs
processed_inputs = processor(text=[inputs['text']], images=[inputs.get('images', None)], return_tensors="pt")
# Generate the embedding
with torch.no_grad():
outputs = model(**processed_inputs)
# Extract the embeddings (you can change which embedding layer to use depending on your task)
embeddings = outputs.pooler_output
return embeddings.tolist() # Return the embeddings as a list for easier use
# encoding image at given path or PIL Image using base64
def download_image(image_path_or_PIL_img):
if isinstance(image_path_or_PIL_img, PIL.Image.Image):
return image_path_or_PIL_img
else:
# this is a image_path
with open(image_path_or_PIL_img, "rb") as image_file:
image_data = image_file.read()
image = Image.open(BytesIO(image_data))
return image
And in L2_Multimodal Embeddings.ipynb
, I changed the Compute Embedding block as below:
embeddings = []
for img in [img1, img2, img3]:
img_path = img['image_path']
caption = img['caption']
# base64_img = encode_image(img_path)
img = download_image(img_path)
embedding = bt_embeddings(caption, img)
# # embeddings.append(embedding)
embeddings += embedding
I got the code working but the cosine similarity returns:
Cosine similarity between ex1_embeded and ex2_embeded is:
0.9268679323546363
Cosine similarity between ex1_embeded and ex3_embeded is:
0.8940821384304778
The cosine similarity is so much higher than the results from the class:
Cosine similarity between ex1_embeded and ex2_embeded is:
0.48566270290489155
Cosine similarity between ex1_embeded and ex3_embeded is:
0.17133985252863604
What’s getting wrong in here?