Record vs. Dataset

Can someone explain the difference between the tensorflow dataset and record? Is one a subset (or parent) of the other?

Hi @wmartin

Here is my take on your question:

TFDS:

tensorflow_datasets (tfds) defines a collection of datasets ready-to-use with TensorFlow.

TF record:

The TFRecord format is a simple format for storing a sequence of binary records.

So both can help to deal with data, but in the end they have a different focus and I do not see here a parent / child relation.

Feel free to study the tutorials in the links (resp. documentation section) provided in the sources above.

Hope that helps!

Best regards
Christian

Thanks for the response, but I think there might be some hierarchical relationship between Dataset and TFRecord objects. I asked this question because of the content in the first video in the TFRecord section, “File Structures in Tensorflow Datasets.”

In this video, the lecturer states the following: “So far in this course you’ve explored TFDS and the built-in datasets that you can use to make training your models much easier codewise. With maybe just one or two lines of code, you can access full datasets. You built on this using the splits API, so you could override the base splits to create your own if you wanted to. All of this works because of the standard underlying format of TFRecord that’s used for all data. …”

From an object-oriented perpsective, the paragraph above seems to infer that TFRecord is a parent (base) class for the Dataset class.

Or, maybe the lecturer is stating that the underlying format is similar between Dataset and TFRecord objects. This point is not clear from that video. I’ll have to do some research as you suggested. What I would like to see is a list of use cases for Dataset objects and a similar list for TFRecord objects. A list of advantages and disadvatages (pros and cons) for each type would also be nice.

Here is a screen shot from that first video which shows a directory of tensorflow_datasets with a subdirectory containing multiple TFRecord files. This seems to further suggest some type of hierarchical relationship between TFRecords and Tensorflow Dataset objects.

Hello @wmartin,
As @Christian_Simonis mentioned both can help you to deal with data and there is no direct parent/child relation but i will try to explain more for you.

TF Datasets is a high-level API that simplifies the process of loading and preprocessing data and supports data transformations, shuffling, batching, and other preprocessing steps. TF Datasets can handle datasets in different formats, including CSV files, NumPy arrays, TensorFlow tensors, and TFRecords.

So now first thing is TFRecord is a data format so it can be seen as one of the data formats that TF Datasets can handle, and it offers advantages in terms of data storage efficiency and faster data access during training.

Maybe you got that feeling that they have parent/child relationship due to commonly use of them together, cause in most cases you deal with large datasets and TFRecord is a suitable and efficient data format to store large datasets but as i mentioned they don’t have a direct parent/child relationship.

I hope it makes sense now
Best Regards,
Jamal

1 Like