C3 W1 - Train/Dev/Test split with Timeseries data & Establishing Bayes Error

Hello Albert @albert_c,

If I were in your situation investigating a prediction system, I would ask myself a lot of questions, so I am going to write some of them down. I don’t expect your answers to my questions (of course unless you choose to, then we can further discuss), but I hope these questions can inspire.

First, I drew my understanding below and this is what I based on:

My biggest concern is, where should the “bad data” place in my understanding because its use was not mentioned.

I can see reasons behind this strategy, but as a trade-off, it also limited your choices because it is a set of fixed combinations which is a source of bias. Therefore, I would first wonder, if I randomly sampled 1000 good data and 1000 bad data, then around, let’s say, centroid number 1, what’s the ratio of good-to-bad?

If the ratio was like 50-50, then was this really a good centroid?

My understanding is that, a good result means that the model correctly predicted the closest centroid, and it does not mean that the temperature turns out lowest or lower than an acceptable line, am I right?

I said the variables were not important because, for example, when we talked about the human error for image recognition, we didn’t ask how many variables. Right? We didn’t ask how it was achieved. Our focus was simply, how good can human tell the image is a cat, right? I think the same applies here. I think the human error for your task could simply be that what’s the best in your industry. I do not need to care about how the best is done, including how many variables anyone has. It’s just like, if the best company could achieve 90% of the time good temperature and yield, then 100%-90%=10% would be my human error, regardless how they achieved that. Then I will need to translate this 10% into something comparable with my model’s prediction capability, such as, my model needed to achieve 5% error in order to achieve a 90% good temperature and yield. 10% is the human error for the task and 5% is the “translated human error” for my model.

(edit: on second thought, one may argue that, yes, we asked how many variables for image recognition, because they are just the image’s pixels. We are asking, if we present the training images to human, what’s the human error? I can’t argue against this, but I think the spirit of human error is just “what’s the general level” or even “where is the ceiling” and I will consider the choice of the set of variables as one of my hyperparameters so it is not part of the question.)