In the final practice lab, near the start we have the statement
# Generate some data
X,y,x_ideal,y_ideal = gen_data(18, 2, 0.7)
Where is gen_data defined? The first parameter obviously specifies 18 data points, but what do the parameter values 2 and 0.7 specify?
Hey @Sumitro_Sarkar,
This function is defined in the assigment_utils.py
file. You can easily download and check it out by using the “Lab Files” option in your Coursera lab. Here, 2
is the value of the random seed and 0.7
is the scale with which the ideal y values should be scaled. This will all make sense to you once you check out the function. I hope this helps.
Cheers,
Elemento
Elemento, many thanks for your very prompt answer . I downloaded the assigment_utils.py from Lab Files,
-
but now how do I go about checking it out? It’s a .py filetype and I cannot open that filetype.
-
I had assumed that the scaling parameter is the amount by which the random noise generated is to be reduced before adding to the ideal line values . I experimented by plotting the gen_data function in the assignment exercise with different values of the scaling parameter. But you wrote that it’s the ideal y values which are scaled. Does scaling the random number as compared to scaling the ideal y values amount to the same thing?
-
And where is the function for ideal line defined? For that I would like to see the code of gen_data, is that possible?
Hey @Sumitro_Sarkar,
You can open it using any code editor easily. VS Code, Sublime Text, PyCharm are some of the options. In fact, you can open it using a text editor as well, if you don’t have any code editor installed. Just use the option “Open With” and use Notepad or whichever text editor your machine has. Otherwise, you can use this online service as well.
Actually, my previous answer was a little bit incomplete. Apologies for that. The function actually uses the scaling factor to scale the product of the ideal y values and the noise. It will be more clear once you check out the code of the gen_data
function. I have attached the same here for your reference.
def gen_data(m, seed=1, scale=0.7):
""" generate a data set based on a x^2 with added noise """
c = 0
x_train = np.linspace(0,49,m)
np.random.seed(seed)
y_ideal = x_train**2 + c
y_train = y_ideal + scale * y_ideal*(np.random.sample((m,))-0.5)
x_ideal = x_train #for redraw when new data included in X
return x_train, y_train, x_ideal,
I hope this helps.
Cheers,
Elemento
Thank you Elemento. It’s all clear now. In the process I learnt about using the Lab Files and the function linspace().
I thought knowing how gen_data() works might be important in future for generating different types of test data with noise.
Thanks once again.