Hi! I am going through the 1st assignment of this course. I understand the part about the differences in schema of the two datasets ie training and serving. But I don’t understand how are we removing the anomalies from the dataset? What are we doing by getting the domain of some features? Please if someone can make me understand what are we trying to achieve in the part 6 of the assignment (Schema Environment) it would be great, thanks.
Hello @Mugheera_Saleem
Which is the exact assignment?
In most cases, you are defining the schema of your data and applying it to both the training and serving datasets. This involves identifying the expected format and data types of each feature, as well as any allowable ranges or domains. By doing this, you can ensure that the datasets are consistent and compatible with the machine learning model you will be training and deploying.
About getting the domain of some features mostly means identifying the range of values that a particular feature can take on. For example, if you have a feature that represents age, the domain of that feature would be the range of possible ages (e.g. 0-100 years). By getting the domain of the features, you can identify any anomalies that fall outside of this range and remove them from the dataset.
Course 2 of MLEP, week-1 assignment.