C3w3quiz1 - Question2 'Worker' feedback comment

Was puzzled by the feedback comment for selecting one of the correct answers for Question2:

That’s right! The term worker is very common and is defined as the accelerator on which it performs some calculations that are performed in this replica.

My understanding of the Tensorflow distributed computing terminology is the following (please feel free to correct/comment on this):

“machine” = PC/laptop (not sure how this applies to virtual machines and cloud computing concepts of a machine).

“device” = processor (CPU/GPU/TPU)

“OneDeviceStrategy” = restricts distributed operations to the cores of a single device.

“accelerator” = software that enables speedups in computations on multicore processors (i.e. on hardware that is capable of doing so) via parallel processing.

“replica” = just seems to be used in Tensorflow documentation (and in the lectures here) without any definition/description of what it actually is. What exactly is a replica??
I stumbled across the following in this 2019 paper by Buchlovsky et al. (DeepMind): https://arxiv.org/pdf/1902.00465.pdf in section 2.2. Graph Replicas:

“To capture common use cases, we shall use the key concept of a replica: a computation designed to be run in parallel across many devices (e.g. one step of SGD), with different input to each device, and some shared state accessible to all replicas (e.g. model parameters).”

… so a replica is a “computation”. …

So, armed with my definitions, I return to my original post:
I had assumed “worker” refers to a running process, but I guess not. I find it referred to in this quiz feedback as the “accelerator” but at the same time as the thing that performs calculations on this accelerator … (which is all happening inside a replica… (which I now know is a computation)).

I stumbled across the following description at: Run Your First Multi-Worker TensorFlow Training Job With GCP AI Platform — The TensorFlow Blog

“In MultiWorkerMirroredStrategy, all machines are designated as workers , which are the physical machines on which the replicated computation is executed.”

“worker” = … ?

Hi Shahin,

Thanks for raising this topic. I have to admit that it is also a bit confusing to me, but the way I understand it is that the worker is the software running a replica (a copy of the model) on a certain device.
In the following video from the TensorFlow: Advanced Techniques Specialization you can find an explanation of a few concepts you highlighted, starting at 02:40

Hopefully this helps.


1 Like

Thanks @mjsmid,

Actually I completed that specialisation in March. I need to go back over it all :sweat_smile:

But indeed it helps to know that I’m not the only one finding confusion in the terminology.

(It seems that the various sources of the definitions I’ve included in my post, as well as those more carefully laid out in the TensorFlow: Advanced Techniques Specialization are sometimes quite different from one another.)