perhaps this is very stupid of me to ask,and i’ll delete this when i get the answer

what is distribution andrew keeps referring to?

Hey @ans,

Are you asking about the meaning of the term?

A probability distribution is a description of how likely a random variable or set of random variables is to take on each of its possible states.

For example, a random variable may be a person’s age. We may represent states of such variable as years in the range (0, n). If we assign a probability to each state for some event, say, of how likely the first person we meet would be of a particular age, we will be able to represent the distribution for the event as a chart with various ages on the horizontal axis and corresponding probabilities on the vertical axis.

*In ML, we sometimes call random variables – features*.

sorry my question was not clear,

i was referring to distribution as in “training and test set must come from x distribution”…

i think he was not using it in any technical sense,i think it was casual use of the word…

That statement relates to the probability distribution. It means that distributions of both sets are the same. In other words, if you plot a chart for each set, as the one above, they will be identical.

what i gathered:

each picture is either from camera or from net,so if we pick dev set as camera pics,we must pick test set also from camera pics

“they should come from the same distribution” as if i pic a random pic from test set and one from dev set ,the probability of them both being from the camera pics collection should be the same.

im i right??

but why are we going in the probabilities realm,i mean suppose i have 2 pics again from dev n test sets…if both have probability say .5 then it is probable that one of them is coming from different set???why assign probabilities ,why not use set theory?