Hi, can someone tell me how to import a public dataset (particularly, from Kaggle) using Keras without downloading it?
I would also like to know if it’s possible to combine different datasets (also from Kaggle) and how can I do that?
Hi @jincy-p-janardhanan , welcome to the DLS !
This is a tricky question. Tensorflow, Keras and other frameworks have a pre-defined list of datasets that you can directly use, specially while learning.
- Keras: Datasets
- Tensorflow: TensorFlow Datasets
But, at least in the cases that I know of, when you use them, the dataset is downloaded to your machine and stored in a temp folder.
This is an example that uses the MNIST dataset:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
And for combining datasets, as far as I know, you will have to manually download and merge them, either by programming something maybe with the Pandas library, using a custom application or the good old command like tools like sed, awk, sort, etc.
Thank you for the update. I was wondering if there’s a way to collaborate with some friends online and work on the same dataset to build a model.
Turns out all I needed was Colab and google drive… lol, silly me!
Hey @jincy-p-janardhanan that’s great !! glad to hear that you found a solution !!