Single model or multiple?

Assaf1 · November 2, 2024, 3:30pm

Hi guys,
I’m new to ML and by now I finished the second course of the machine learning specialization,
I am looking to create a model that goes through a set of files and detects similar duplicates.
but I have a question and I’m not sure what is the better choice, should I create a single model for each file type, or will a single model for all file types be just as good?

Christian_Simonis · November 2, 2024, 4:03pm

Hi @Assaf1

if I understand correctly you want to check if the files are identical content-wise and if so, you would (manually?) check on it and replace the dupes. Right?

Can you abstract away the different file formats?

If so: From my perspective one model could make sense if you have well structured data and good knowledge about the features how to identify that they are unique or dupes. As a simple example: if you know the data model well and e.g. have a unique Identifier it is easy to check for dupes by counting and a rule based judgement. Dependent on your domain knowledge you might derive good features that indicate that there is a duplicate, e.g. by checking identical content in your files if this could be realised in a computationally efficient time (otherwise taking samples).

I think if you have more unstructured data (like video snippets) and you wanna check if they were also part of already maintained data (like your video data base), you could check for similarity measures, like distances in your embeddings (that you obtain after a transformation). This thread on calculating a convolution could be interesting if you wanna apply signal processing methods from control and system theory, see also this thread: How to Calculate the Convolution? - #2 by Christian_Simonis .

Also here a model could work, if you manage to get the data into an embedding space (transformed) to make your decision / classification.

Best regards
Christian

Topic		Replies	Views
Different input shapes in the same model AI Discussions model-customization , model-architecture	1	29	November 25, 2024
Building and analysing single hidden layer neural network on data with sampling & replacement AI Discussions	2	39	February 25, 2023
Building a data set - Validating on Replicates Structuring Machine Learning Projects coursera-platform	1	423	July 21, 2023
ML Question on extracting features from sequence of images Convolutional Neural Networks coursera-platform	30	772	March 4, 2023
Need guidance DL problem AI Discussions ai-discussions , cnn	2	29	December 26, 2024

Single model or multiple?

Related topics