Fix of a small annoyance in "Tensorflow Introduction" assignment

Not much but:

When loading data from disk in the 4th cell, we read:

train_dataset = h5py.File('datasets/train_signs.h5', "r")
test_dataset = h5py.File('datasets/test_signs.h5', "r")

The name of the variables is ill-chosen. Those are not dataset objects, they are h5py file objects (that are also h5py group object) which hold datasets.

Thus one would like to see:

# See https://docs.h5py.org/
train_group = h5py.File('datasets/train_signs.h5', "r")
test_group = h5py.File('datasets/test_signs.h5', "r")

Or, with some code to show what’s going on:

def print_hdf5_file_info(f, name):
    if isinstance(f, h5py.File):
        print(f"HDF5 file object name '{name}'")
        print(f"   root group name: {f.name}")
        for k in f.keys():
            stuff = f[k]
            if isinstance(stuff, h5py.Dataset):
                print(f"   group member '{k}' is a dataset")
                print(f"      shape  : {stuff.shape}")
                print(f"      dtype  : {stuff.dtype}")
                print(f"      size   : {stuff.size} elements in dataset")
                print(f"      ndim   : {stuff.ndim} dimensions")
            elif isinstance(stuff, h5py.Group):
                print(f"   group member '{k}' is a group with name {stuff.name}")
            else:    
                print(f"   group member '{k}' is unexpectedly a {type(stuff)}")
    else:
        print(f"Not a HDF5 file object, but a {type(f)}")

# See https://docs.h5py.org/
train_group = h5py.File('datasets/train_signs.h5', "r")
test_group = h5py.File('datasets/test_signs.h5', "r")

print_hdf5_file_info(train_group, "train_group")
print_hdf5_file_info(test_group, "test_group")    

Which outputs the following:

HDF5 file object name 'train_group'
   root group name: /
   group member 'list_classes' is a dataset
      shape  : (6,)
      dtype  : int64
      size   : 6 elements in dataset
      ndim   : 1 dimensions
   group member 'train_set_x' is a dataset
      shape  : (1080, 64, 64, 3)
      dtype  : uint8
      size   : 13271040 elements in dataset
      ndim   : 4 dimensions
   group member 'train_set_y' is a dataset
      shape  : (1080,)
      dtype  : int64
      size   : 1080 elements in dataset
      ndim   : 1 dimensions
HDF5 file object name 'test_group'
   root group name: /
   group member 'list_classes' is a dataset
      shape  : (6,)
      dtype  : int64
      size   : 6 elements in dataset
      ndim   : 1 dimensions
   group member 'test_set_x' is a dataset
      shape  : (120, 64, 64, 3)
      dtype  : uint8
      size   : 1474560 elements in dataset
      ndim   : 4 dimensions
   group member 'test_set_y' is a dataset
      shape  : (120,)
      dtype  : int64
      size   : 120 elements in dataset
      ndim   : 1 dimensions
1 Like