Explain: Tensor Flow / TFX / Metadata Store / data model definitions

GOAL

  1. Explain the following definitions in plane simple English?
  2. Many practical examples of what they can have?
  3. What each of them do?

ORIGINAL DOCUMANTATION

This is the original ML Metadata  |  TFX  |  TensorFlow that I have issues with. In my description I will point back to this. “The Metadata Store uses the following data model to record and retrieve metadata from the storage backend.”:

  • ArtifactTypedescribes an artifact’s type and its properties that are stored in the metadata store. You can register these types on-the-fly with the metadata store in code, or you can load them in the store from a serialized format. Once you register a type, its definition is available throughout the lifetime of the store.
  • An Artifactdescribes a specific instance of an ArtifactType, and its properties that are written to the metadata store.
  • An ExecutionTypedescribes a type of component or step in a workflow, and its runtime parameters.
  • An Executionis a record of a component run or a step in an ML workflow and the runtime parameters. An execution can be thought of as an instance of an ExecutionType. Executions are recorded when you run an ML pipeline or step.
  • An Eventis a record of the relationship between artifacts and executions. When an execution happens, events record every artifact that was used by the execution, and every artifact that was produced. These records allow for lineage tracking throughout a workflow. By looking at all events, MLMD knows what executions happened and what artifacts were created as a result. MLMD can then recurse back from any artifact to all of its upstream inputs.
  • A ContextTypedescribes a type of conceptual group of artifacts and executions in a workflow, and its structural properties. For example: projects, pipeline runs, experiments, owners etc.
  • A Contextis an instance of a ContextType. It captures the shared information within the group. For example: project name, changelist commit id, experiment annotations etc. It has a user-defined unique name within its ContextType.
  • An Attributionis a record of the relationship between artifacts and contexts.
  • An Associationis a record of the relationship between executions and contexts.

How I see them
(Please correct my descriptions or let me know if they are right)

  • Def ArtifactType (how I can draw down based on the documentations):
    • Contains the base data
    • Contains many iteration and multiple modified version of the base data
    • Defines the datatypes
    • Have properties
    • Sores data as metadata in metadata storage ex.: database, in ram.
  • 1 Artifact (how I can draw down based on the documentations):
    • 1 version of the modified data
    • 1 version of the modified data’s properties
    • 1 version of the modified data’s data types
    • 1 version of a specific instance of an ArtifactType, and its properties that are written to the metadata store.
    • !! BUT THAN “List all Artifacts of a specific type. Example: all Models that have been trained.” ML Metadata  |  TFX  |  TensorFlow → So saved down model can also be Artifacts. This documentation is just terrible their ArtifactType and Artifact is pointing on each other whiteout explain any of them what it is. It doesn’t makes any sense.
  • ExecutionType:
  • Execution:
    • What is a record here?
    • What is a component
    • What is a component run?
    • What runtime parameters are we talking about?
    • What is Execution overall?
    • 1 version of a specific instance of ExecutionType.
    • Executions save to metadata storage (ex.: RAM or database) you run an ML pipeline or step.
  • Event:
    • Is a record of the relationship between artifacts and executions.
    • to me it is not clear why is this step even necessary because event and execution sounds like they fulfill the same exact purpose.
    • to me it seems like execution saves down itself than why do we need an event to save it down again?
    • This is the only understandable statement in this definition “By looking at all events, MLMD knows what executions happened and what artifacts were created as a result.”
  • ContextType:
    • What is “conceptual group of artifacts and executions” ? Especially what is “conceptual” about them?
    • perfect examples this is what all the other description should be.
  • Context:
    • 1 version of the ContextType.
    • Again what is this “conceptual group of artifacts and executions” ? Especially what is “conceptual” about them?
    • Again GREAT examples.
  • Attribution:
    • simple and understandable description
    • If all the elements have and describes them self why is this necessary?
  • Association:
    • simple and understandable description
    • If all the elements have and describes them self why is this necessary?

Previous recommendations