Not much but:
When loading data from disk in the 4th cell, we read:
train_dataset = h5py.File('datasets/train_signs.h5', "r")
test_dataset = h5py.File('datasets/test_signs.h5', "r")
The name of the variables is ill-chosen. Those are not dataset objects, they are h5py file objects (that are also h5py group object) which hold datasets.
Thus one would like to see:
# See https://docs.h5py.org/
train_group = h5py.File('datasets/train_signs.h5', "r")
test_group = h5py.File('datasets/test_signs.h5', "r")
Or, with some code to show what’s going on:
def print_hdf5_file_info(f, name):
if isinstance(f, h5py.File):
print(f"HDF5 file object name '{name}'")
print(f" root group name: {f.name}")
for k in f.keys():
stuff = f[k]
if isinstance(stuff, h5py.Dataset):
print(f" group member '{k}' is a dataset")
print(f" shape : {stuff.shape}")
print(f" dtype : {stuff.dtype}")
print(f" size : {stuff.size} elements in dataset")
print(f" ndim : {stuff.ndim} dimensions")
elif isinstance(stuff, h5py.Group):
print(f" group member '{k}' is a group with name {stuff.name}")
else:
print(f" group member '{k}' is unexpectedly a {type(stuff)}")
else:
print(f"Not a HDF5 file object, but a {type(f)}")
# See https://docs.h5py.org/
train_group = h5py.File('datasets/train_signs.h5', "r")
test_group = h5py.File('datasets/test_signs.h5', "r")
print_hdf5_file_info(train_group, "train_group")
print_hdf5_file_info(test_group, "test_group")
Which outputs the following:
HDF5 file object name 'train_group'
root group name: /
group member 'list_classes' is a dataset
shape : (6,)
dtype : int64
size : 6 elements in dataset
ndim : 1 dimensions
group member 'train_set_x' is a dataset
shape : (1080, 64, 64, 3)
dtype : uint8
size : 13271040 elements in dataset
ndim : 4 dimensions
group member 'train_set_y' is a dataset
shape : (1080,)
dtype : int64
size : 1080 elements in dataset
ndim : 1 dimensions
HDF5 file object name 'test_group'
root group name: /
group member 'list_classes' is a dataset
shape : (6,)
dtype : int64
size : 6 elements in dataset
ndim : 1 dimensions
group member 'test_set_x' is a dataset
shape : (120, 64, 64, 3)
dtype : uint8
size : 1474560 elements in dataset
ndim : 4 dimensions
group member 'test_set_y' is a dataset
shape : (120,)
dtype : int64
size : 120 elements in dataset
ndim : 1 dimensions