Record vs. Dataset

wmartin · July 16, 2023, 5:32pm

Can someone explain the difference between the tensorflow dataset and record? Is one a subset (or parent) of the other?

Christian_Simonis · July 16, 2023, 6:22pm

Hi @wmartin

Here is my take on your question:

TFDS:

tensorflow_datasets (tfds) defines a collection of datasets ready-to-use with TensorFlow.

Module: tfds | TensorFlow Datasets

TF record:

The TFRecord format is a simple format for storing a sequence of binary records.

TFRecords と tf.train.Example | TensorFlow Core

So both can help to deal with data, but in the end they have a different focus and I do not see here a parent / child relation.

Feel free to study the tutorials in the links (resp. documentation section) provided in the sources above.

Hope that helps!

Best regards
Christian

wmartin · July 22, 2023, 4:44pm

Thanks for the response, but I think there might be some hierarchical relationship between Dataset and TFRecord objects. I asked this question because of the content in the first video in the TFRecord section, “File Structures in Tensorflow Datasets.”

In this video, the lecturer states the following: “So far in this course you’ve explored TFDS and the built-in datasets that you can use to make training your models much easier codewise. With maybe just one or two lines of code, you can access full datasets. You built on this using the splits API, so you could override the base splits to create your own if you wanted to. All of this works because of the standard underlying format of TFRecord that’s used for all data. …”

From an object-oriented perpsective, the paragraph above seems to infer that TFRecord is a parent (base) class for the Dataset class.

Or, maybe the lecturer is stating that the underlying format is similar between Dataset and TFRecord objects. This point is not clear from that video. I’ll have to do some research as you suggested. What I would like to see is a list of use cases for Dataset objects and a similar list for TFRecord objects. A list of advantages and disadvatages (pros and cons) for each type would also be nice.

wmartin · July 23, 2023, 3:18pm

Here is a screen shot from that first video which shows a directory of tensorflow_datasets with a subdirectory containing multiple TFRecord files. This seems to further suggest some type of hierarchical relationship between TFRecords and Tensorflow Dataset objects.

Jamal022 · July 24, 2023, 1:50pm

Hello @wmartin,
As @Christian_Simonis mentioned both can help you to deal with data and there is no direct parent/child relation but i will try to explain more for you.

TF Datasets is a high-level API that simplifies the process of loading and preprocessing data and supports data transformations, shuffling, batching, and other preprocessing steps. TF Datasets can handle datasets in different formats, including CSV files, NumPy arrays, TensorFlow tensors, and TFRecords.

So now first thing is TFRecord is a data format so it can be seen as one of the data formats that TF Datasets can handle, and it offers advantages in terms of data storage efficiency and faster data access during training.

Maybe you got that feeling that they have parent/child relationship due to commonly use of them together, cause in most cases you deal with large datasets and TFRecord is a suitable and efficient data format to store large datasets but as i mentioned they don’t have a direct parent/child relationship.

I hope it makes sense now
Best Regards,
Jamal

Topic		Replies	Views
Tensorflow datasets TensorFlow: Data and Deployment Specialization week-1	1	25	March 29, 2022
Transfer_learning_with_MobileNet_v1 Assignment Convolutional Neural Networks coursera-platform	2	513	March 10, 2022
Understanding Dataset processing Sequences, Time Series and Prediction week-4	2	621	April 6, 2023
How could i use trax.data.TFDS to import data from two local txt files NLP with Attention Models week-1	1	548	May 20, 2022
TFDS-V2-Week2 assignment Data Pipelines with TensorFlow Data Services week-2	0	453	February 26, 2024

Record vs. Dataset

Related topics