How to create training set from custom paths

paulinpaloalto · January 30, 2022, 5:23pm

Using the Dataset construct that Balaji points out will simplify the code a bit, but my guess is the fundamental problem is that you are sequentially opening, reading and closing 18,000 disk files every time you run this. That’s a lot of file operations and it’s just going to take a long time. Another approach would be to repurpose that code as a “preprocessing” step that you only need to run once: do the open, read operations on each file and then create a single output database file in a format like h5 that contains all the images. The h5 file will be pretty huge of course (the sum of the sizes of all the individual files plus a bit of overhead), but I would bet that then your actual model code that opens and loads that file will be a lot quicker.

Here’s a thread from a fellow student about how to create an h5 file containing multiple images from separate files. You’ll have to adjust the code for your purposes, but it gives you all the logic you need to build your “preprocessor”. Then your “load” logic that you run every time can be simplified to the way the load_dataset functions work in the C1 W2 Logistic Regression assignment e.g.

Topic		Replies	Views
Week 3 Assignment - Image Segmentation with U-Net Convolutional Neural Networks	29	712	October 17, 2023
Creating tf.Data.dataset from URL images AI Discussions	7	43	March 15, 2023
Convolutional Neural Networks in TensorFlow - week 1 Convolutional Neural Networks in TensorFlow week-4	6	695	November 29, 2023
My split_data is taking more time to execute Convolutional Neural Networks in TensorFlow week-1	2	534	December 19, 2022
Straight Up Stuck Convolutional Neural Networks in TensorFlow week-1	5	428	June 26, 2023

How to create training set from custom paths

Related topics