H5py files generation

What are these h5py files and how do we convert our images to h5py??

Welcome to the community.

h5py is a library to access HDF5 (Hierarchical Data Format). HDF5 is quite useful for numpy users, since we can easily store high rank big data directly, and can read/write part of data with numpy slice style. Image data is typically 3-dimensional data to be easily stored into HDF5.
For more detail, please see this.

how to convert images on our PC to h5py files???

Here is an example. You can do on your preferable language/platform.

import h5py
import matplotlib.image as mpimg
from matplotlib import pyplot as plt

# Read image data
img = mpimg.imread('./images/classification_kiank.png')
plt.imshow(img)

# Write into HDF5
with h5py.File('./images/image.h5', 'w') as f:
    dset = f.create_dataset('classification', data=img)

# Read back
f2 = h5py.File('./images/image.h5', 'r')
dset2 = f2['classification']
plt.imshow(dset2)
f2.close()

how to convert a whole folder of images with test and train set in it into h5py??

A big extra service for you. In this case, I created both training and test set in HDF5, but you can create a single set, and separate into two later as you like. You can also put a label in HDF5, just like I create two dataset in one file. Again, HDF5 has hierarchical structure, which is very useful.

Code itself is super-straight forward. Simple iteration of read and write.

If you need further assistance, please talk with your friend, Google. :slight_smile:

import numpy as np
import h5py
import cv2
import glob
import random
from matplotlib import pyplot as plt

# Get image list
# This is just an example.  You can split train and test as you like

list_images = glob.glob('images/*.jpg')

TRAIN_SPLIT = 0.7
split_count = np.floor(len(list_images)*TRAIN_SPLIT).astype(int)

# shuffle images as samples for testing
random.shuffle(list_images)

# split list into training and test set
train_list = list_images[0:split_count]
test_list = list_images[split_count:]

train_length = len(train_list)
test_length = len(test_list)

# set train/test image size (convert to this shape if an original image is not)
IMAGE_WIDTH = 640
IMAGE_HEIGHT = 640

# Write into HDF5
with h5py.File('./images/image.h5', 'w') as f:

    # training set
    train_set = f.create_dataset('train', shape = (train_length, IMAGE_WIDTH, IMAGE_HEIGHT, 3), dtype=int)
    for count, img_name in enumerate(train_list):
        img = cv2.imread(img_name, cv2.IMREAD_COLOR)
        # This is an optional.  But, usually, we expect that all images in train/test set has the same size
        img = cv2.resize(img, (IMAGE_WIDTH, IMAGE_HEIGHT))
        train_set[count] = img

    # test set
    test_set = f.create_dataset('test', shape = (test_length, IMAGE_WIDTH, IMAGE_HEIGHT, 3), dtype=int)
    for count, img_name in enumerate(test_list):
        img = cv2.imread(img_name, cv2.IMREAD_COLOR)
        # This is an optional.  But, usually, we expect that all images in train/test set has the same size
        img = cv2.resize(img, (IMAGE_WIDTH, IMAGE_HEIGHT))
        test_set[count] = img