Loading the exercise data into Databricks

Hello.
Since I will be working eventually on Microsoft Azure Databricks, I am trying to run the week 2 practice lab also using Databricks.
I need help with the [x_train, y_train = load_data()] command. I’ve successfully uploaded the ex1data1 data file as a .csv file and it is stored in Databricks on dbfs:/FileStore/df/ex1data1.csv.
I would appreciate help with this first basic data load command.
Thanks
Daniel

Hello @Daniel_Deutscher, Have you downloaded other files along with ex1data1 file, like utils.py and public_test.py.
Load data is not a function in numpy but a user defined function, which is present in utils.py file. You should download those files. In case, you want to implement the load_data function in your own file, then code for that is:

def load_data():
data = np.loadtxt(“data/ex1data1.txt”, delimiter=‘,’)
X = data[:,0]
y = data[:,1]
return X, y

file path may change according to directory where you have stored the files.

Thanks for the clarifications. Databricks does not seem to accept txt files, but does accept cdv files. So I ended up loading the data into a data frame using the following code which worked fine:

df = spark.read.csv(‘/FileStore/df/ex1data1.csv’, inferSchema =True, header=False) ##inferSchema =True for numeric import
df.createOrReplaceTempView(“df”)
x_train, y_train = df

Then I only had to reshape the data into bumpy arrays using:
x_train = np.array(df.select(“_c0”).collect()) .reshape(-1) #for 1-D array
y_train = np.array(df.select(“_c1”).collect()) .reshape(-1) #for 1-D array

Does this make sense?
Thanks
Daniel