C3w3 OneDeviceStrategy

shahin · July 8, 2021, 5:07pm

Not clear from TF documentation if this means that the data would be trained in a distributed fashion across multiple cores of a single CPU/GPU (i.e. assuming “device” is not referring to a computer/server but rather to a processing unit like a CPU/GPU).

… Or does using tf.distribute.OneDeviceStrategy mean not actually distributed at all but could be using the same code that you would then go on to try with one of the real distributed ‘strategy’ libraries like tf.distribute.MirroredStrategy ?.. if you know what I mean.

tranvinhcuong · July 9, 2021, 9:44am

I think this interpretation is right as I read from the doc strings

On my machine with list of device is following:

tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

So I think “device” here refers to a processing unit than a computer/server.

For reference the following code have both string (one_device) pairs with the description of the device device:CPU:0 .

shahin · July 9, 2021, 1:24pm

So, does this mean the code is distributed across the cores of one multicore processor ?

tranvinhcuong · July 10, 2021, 3:20am

let’s me do some exam to check the behavior of the code before giving you a validated answer.

tranvinhcuong · July 10, 2021, 8:00am

hi @shahin , I copied sample code from the Tensorflow code base

"""Benchmark tests for CPU performance of Keras models."""

import numpy as np

import tensorflow as tf

from tensorflow.python.keras.benchmarks import benchmark_util
from tensorflow.python.platform import benchmark   # pylint: disable=unused-import

# Loss function and optimizer.
_LOSS = 'binary_crossentropy'
_OPTIMIZER = 'rmsprop'


class KerasModelCPUBenchmark(  # pylint: disable=undefined-variable
    tf.test.Benchmark, metaclass=benchmark.ParameterizedBenchmark):
  """Required Arguments for measure_performance.

      x: Input data, it could be Numpy or load from tfds.
      y: Target data. If `x` is a dataset, generator instance,
         `y` should not be specified.
      loss: Loss function for model.
      optimizer: Optimizer for model.
      Other details can see in `measure_performance()` method of
      benchmark_util.
  """
  # The parameters of each benchmark is a tuple:

  # (benchmark_name_suffix, batch_size, run_iters).
  # benchmark_name_suffix: The suffix of the benchmark test name with
  # convention `{bs}_{batch_size}`.
  # batch_size: Integer. Number of samples per gradient update.
  # run_iters: Integer. Number of iterations to run the
  # performance measurement.

  _benchmark_parameters = [
      ('bs_32', 32, 3), ('bs_64', 64, 2), ('bs_128', 128, 2),
      ('bs_256', 256, 1), ('bs_512', 512, 1)]

  def _mnist_mlp(self):
    """Simple MLP model."""
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(512, activation='relu', input_shape=(784,)))
    model.add(tf.keras.layers.Dropout(0.2))
    model.add(tf.keras.layers.Dense(512, activation='relu'))
    model.add(tf.keras.layers.Dropout(0.2))
    model.add(tf.keras.layers.Dense(10, activation='softmax'))

    return model

  def _mnist_convnet(self):
    """Simple Convnet model."""
    model = tf.keras.Sequential()
    model.add(
        tf.keras.layers.Conv2D(
            32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
    model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Dropout(0.25))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(128, activation='relu'))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(10, activation='softmax'))

    return model

  def _imdb_lstm(self):
    """Simple LSTM model."""
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Embedding(20000, 128))
    model.add(tf.keras.layers.LSTM(128, dropout=0.2, recurrent_dropout=0.2))
    model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

    return model

  def benchmark_mnist_mlp(self, batch_size, run_iters):
    """Benchmark for MLP model on synthetic mnist data."""
    mlp_x = np.random.random((5000, 784))
    mlp_y = np.random.random((5000, 10))
    metrics, wall_time, extras = benchmark_util.measure_performance(
        self._mnist_mlp,
        x=mlp_x,
        y=mlp_y,
        batch_size=batch_size,
        epochs=2000,
        run_iters=run_iters,
        optimizer=_OPTIMIZER,
        loss=_LOSS,
        distribution_strategy='one_device')
    self.report_benchmark(
        iters=run_iters, wall_time=wall_time, metrics=metrics, extras=extras)

  def benchmark_mnist_convnet(self, batch_size, run_iters):
    """Benchmark for Convnet model on synthetic mnist data."""
    convnet_x = np.random.random((5000, 28, 28, 1))
    convnet_y = np.random.random((5000, 10))
    metrics, wall_time, extras = benchmark_util.measure_performance(
        self._mnist_convnet,
        x=convnet_x,
        y=convnet_y,
        batch_size=batch_size,
        run_iters=run_iters,
        optimizer=_OPTIMIZER,
        loss=_LOSS)
    self.report_benchmark(
        iters=run_iters, wall_time=wall_time, metrics=metrics, extras=extras)

  def benchmark_imdb_lstm(self, batch_size, run_iters):
    """Benchmark for LSTM model on synthetic imdb review dataset."""
    lstm_x = np.random.randint(0, 1999, size=(2500, 100))
    lstm_y = np.random.random((2500, 1))
    metrics, wall_time, extras = benchmark_util.measure_performance(
        self._imdb_lstm,
        x=lstm_x,
        y=lstm_y,
        batch_size=batch_size,
        run_iters=run_iters,
        optimizer=_OPTIMIZER,
        loss=_LOSS)
    self.report_benchmark(
        iters=run_iters, wall_time=wall_time, metrics=metrics, extras=extras)
benchmark = KerasModelCPUBenchmark()
benchmark.benchmark_mnist_mlp__bs_128()

and modified it to use one_device running on CPU, with 2000 epochs to make it easy to check CPU utilization.

I run it on a machine with a CPU of 24 cores, the result is all the cores are being utilized for the benchmark run as the following image:

shahin · July 10, 2021, 3:06pm

Wow! That’s cool @tranvinhcuong ! (I need to learn how to use this benchmark thingy.)

Please may I also see your code for OneDeviceStrategy ? (Before I saw your message, I was starting to think that the number positioned at the end of device:CPU: … specified a single core of the CPU, but I guess not… ? tf.distribute.OneDeviceStrategy("device:CPU:0")). I wonder if there’s a way to make the code execute on only one core… (although I’m not sure what benefit there would be of doing that… …)

tranvinhcuong · July 11, 2021, 12:14am

hi @shahin, actually I confused about the way TF index the cores in CPU in the deleted message, but looking at the result of this
tf.config.list_physical_devices() I see the CPU (multi-cores) is addressed as one device

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

The code in the previous message is for One device, it is done by using the parameter distribution_strategy='one_device' in the measure_performance method.
And the class is for CPU benchmark, so this way we can check one device strategy on a machine with one CPU (multi-core). The result is it tries to utilize all the cores, and I think we cannot use just a specific core inside the CPU.

Topic		Replies	Views
C3w3quiz1 - Question2 'Worker' feedback comment Machine Learning Modeling Pipelines in Production	3	536	July 12, 2021
C2_W4_Lab_3_Images: Question on tf.device context manager Machine Learning Data Lifecycle in Production	1	530	July 8, 2021
C3W3-On Premise Distributed Training with MultiworkerMirroredStrategy Machine Learning Modeling Pipelines in Production	1	615	July 31, 2022
Weights update on multi GPU mirrored strategy Custom and Distributed Training with TF week-4	3	576	October 18, 2022
Additional Content - Running code in kaggle Custom and Distributed Training with TF week-3	1	496	June 1, 2023

C3w3 OneDeviceStrategy

Related topics