Week 1 Assignment

Hello!

I completed the assignment and got the full score. However, I am thinking about the assignment concept; we are adding some unseen feature values to the feature domain and remove the anomalies. But, we don’t add any samples with those values! For example, there is no sample data with Neurophysiology as value for the medical_specialty feature in the train set.

Consider one-hot encoding of a categorical column. It’s important to know the complete domain of categories for that column. schema can help with this when validation set contains categories not present in training data.