How to import a public dataset in Keras?

Hi, can someone tell me how to import a public dataset (particularly, from Kaggle) using Keras without downloading it?
I would also like to know if it’s possible to combine different datasets (also from Kaggle) and how can I do that?

Hi @jincy-p-janardhanan , welcome to the DLS !

This is a tricky question. Tensorflow, Keras and other frameworks have a pre-defined list of datasets that you can directly use, specially while learning.

But, at least in the cases that I know of, when you use them, the dataset is downloaded to your machine and stored in a temp folder.

This is an example that uses the MNIST dataset:

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

And for combining datasets, as far as I know, you will have to manually download and merge them, either by programming something maybe with the Pandas library, using a custom application or the good old command like tools like sed, awk, sort, etc.

1 Like

Thank you for the update. I was wondering if there’s a way to collaborate with some friends online and work on the same dataset to build a model.

Turns out all I needed was Colab and google drive… lol, silly me!

Hey @jincy-p-janardhanan that’s great !! glad to hear that you found a solution !!