Preparing Data for Deep Learning

Bakir · May 11, 2021, 7:41am

Hello fellow Deep Learners,

where can I find Information on how to prepare my data (X and Y) to start my own Deep Learning project?

So far – I’m on DLS Course 2 – all data has been provided and imported in each programming exercise. Now I’m looking for a way to learn how to get my own data set up.

Than you and best regards,
Bakir

Mubsi · May 11, 2021, 10:55pm

Hey @Bakir ,

I’m really happy to know you are having so much interest in machine learning.

I did a quick google search and this is the top pick.

I want to bring up something important here, and that is your question has quite a lot of answers.

You can start by preparing your own dataset and map them in a H5 files, the kind that are shared in our courses. H5 files are a great way to read data. Even .csv files.

That’s what I personally did. Used a H5 file from this course, understood how it worked by reverse engineering it, and then making my own for my own dataset.

You can even read data from a text file.

You can search on websites like kaggle for existing datasets. Even TensorFlow Datasets.

Really, there are a lot of possibilities. Everything is a google search away.

To learn more about Tensorflow Datasets, you can take our Tensorflow Specialisations.

Mubsi · May 11, 2021, 10:59pm

TensorFlow has a very neat way of reading data. For example for “a cat, dog, or none”, all you need to do is place all the pictures of the same kind in one folder. For example, all the cat pictures in one folder, all the dog pictures in another and all the other kind of pictures in another folder. Then all you have to do is point towards these folders and Tensorflow takes cares of it rest. It automatically reads them, shuffles them t make a dataset and assign them labels.

Bakir · June 7, 2021, 7:07am

Thank you for your reply, @Mubsi – very helpful indeed

Harshit1097 · March 15, 2023, 3:34pm

Hi community! I wish to have opinions on data preparation in case of image data. My dataset consists of around 500 thousand images of chest x-ray. I’ll be using EfficientNet architecture for my model. While preparing data for training, I came across some images which are unlike others (see the attached pic). In some of these anomalous images, large portion of the image is just black pixels, in others it is white, and in some others, there is high noise, etc.
How should I go about figuring out these images from my dataset because it isn’t possible to look at each image manually? Also, should I outrightly reject these images or include them in my dataset after some modifications?

rmwkwok · March 16, 2023, 12:14am

Hello @Harshit1097

Identify the characteristics of those images. From what I have seen, in those images, there seems to always have rows (or columns) that are completely dark. You might develop algorithms to scan each image, count the number of dark rows and the number of dark columns, and finally tag photos whose numbers are larger than some threshold value experimented and set by you.

You can then take out those tagged images, and visually inspect them yourself. This inspection step is important for you to develop confidence to that tagging algorithm. Rely on yourself first before relying on that algorithm.

It is irresponsible to answer a question like that without thorough understanding of the situation.

It is a decision by whether you need them or not. For example, do you know why those images exist? Does the testing samples or the real-world samples have something similar to that or share similar characteristics? Does removing those images result in any drop of performance ?

Therefore, only you can answer it for yourself.

Cheers,
Raymond

Harshit1097 · March 16, 2023, 5:19pm

Thanks @rmwkwok for the detailed reply!

Topic		Replies	Views
Prepare Deep Learning Data set Convolutional Neural Networks	1	605	April 12, 2023
General questions about DLS course 1 week 4 Neural Networks and Deep Learning	1	665	December 17, 2021
Creating own datasets or imagesets Advanced Learning Algorithms week-1	4	708	March 10, 2023
Creating my own dataset Sequence Models	1	541	May 1, 2021
Create my own datasets in tensorflow AI Discussions	4	77	May 1, 2023

Preparing Data for Deep Learning

Related topics