How can I turn my raw images and true labels into an (X, y) array for deep learning?

I want to batch download images from the internet for an ML project. Suppose I manually create the true labels and save them in a CSV file. How do I combine them into an (X, y) array for deep learning purposes? What are the best practices? Do any of the courses in this specialization (or any of the others) go into these practical issues?

Hi @gaussian I cannot suggest a specific course at the moment. If you are usign tensorflow the right place to start is to study the Dataset API: In the tensorflow tutorials you find also diffferent examples of use that covers a lot of useful situation.


I think @crisrise suggestion regarding Dataset (and Dataloader) are a good hint for Tensorflow approach.

In any case, I’m not sure what you want to say with combining into an (X, y) array, you could simply have the X in an array and the y in another one in such a way that X[i] and y[i] are related i.e. the input X[i] corresponds to the output y[i].

You could read the CSV into a Pandas dataframe with two columns X and y, then you could extract X and y as arrays from the dataframe as a list of values. Assuming the dataframe is df it would be something like:

X = df['X'].values
y = df['y'].values

As what I guess you have in the X position in the CSV is the name of the file you would need to read the images in another array.

Kind regards,

@albertovilla Thank you for your clear description.
To expand on the question of the OP, assume that I have downloaded 100 photos, each of a different size (let’s say one of them is 640 pixels by 950 pixels and it represents a red car). Hence, the y value of this particular image is “car” or I may code it as 3. How do I create the X array that represents the pixels of the image? … Sorry, my question might be trivial but I have just started course #1. Is this type of conversion covered in this specialization?

The DLS Specialization does not cover preparing data for use with Machine Learning models. That’s a big topic that is generally called “Data Science”. You can find lots of courses on that whole area, including the Practical Data Science Specialization from

But your question is very specific and it does have a straightforward answer: if you are building a Deep Learning model that processes images, you need to decide on a fixed resolution and image type (RGB, CMYK, greyscale …) for the input images and a particular “label” format. You then will need to convert any inputs into the chosen image representation and size. Fortunately any decent image library will provide “resize” and type conversion functions. For example, take a look at the Logistic Regression assignment in Week 2 and see the section at the end titled “Test with Your Own Image”. In that section, they give you the code to resize your images to 64 x 64. It’s a simple line of code using the python Image library.

As you go through the Week 2 assignment, you can also look at how they read in the input images. See the routine load_dataset which is in one of the accompanying utility files (see the FAQ Thread for instructions about how to open such a file). You’ll see that the images and labels are packaged as an “h5” format database file. Here’s a thread from a fellow student who did the work to figure out how to create an h5 file with images and labels.