Cache / caches on Week4 Assignment 1

I am wondering whether there is some logic to the structure of the ‘cache’ variable in the code? Would it not be more stylistic to create separate dicts for each of the W’s, A’s, b’s, etc etc?

In addition, it is confusing that in some places ‘cache’ refers to a tuple that holds the linear cache (itself a tuple) and also the activation tuple, while in other places ‘cache’ refers simply to the linear cache.

The treatment of the ‘cache’ variable substantially increases the difficulty of the assignment for no particular reason.

1 Like

Hi @Nathaniel_Light ,

Thanks much for the feedback. I was left with the very same impression when I took the specialization. An advantage of using tuples is their immutability, which may prevent some headaches in an educational context. Perhaps the Jupyter notebook, more specifically. The choice might be different in a development context. I am not a developer.

Maybe someone with a longer organizational history will chime in on this. I am interested as well.

Thanks again,
@kenb

The original creators of much of this content also were not professional developers, and in my opinion as a long time if now former developer there are places in the code that reflect this. Stuffing things into a generic untyped container object and indexing by dynamically concatenating strings inside for loops is not best practice. Nor is using the same generic name to do different things in different places. If nothing else, it’s a recipe for maintenance programming disaster. I think it is a rather literal transcription of the whiteboards from the lecture videos into Python. Maybe chosen for pedagogical benefit, or maybe just a hack.

Thanks, @ai_curious. I am guessing that the notebooks reflect the manner in which the ways in which quantitative scientists of various stripes work. Since we are not teaching software development, that’s a fair choice. For my part, it served as a good introduction to the Python scientific stack following a lifelong Matlab habit! FWIW, pedagogy should be leading the charge here. The question on the table is whether dicts serve that purpose better than tuples.

I think I agree

Maybe it is too much to require learners enrolled in these courses tackle encapsulation and complex custom data types aka classes. But I don’t know how to choose between options like dict vs tuple that I find equally distasteful. I think of the world like this sample code fragment I find in the Keras base_layer source on github…


def build(self, input_shape):  # Create the state of the layer (weights)
      w_init = tf.random_normal_initializer()
      self.w = tf.Variable(
          initial_value=w_init(shape=(input_shape[-1], self.units),
                               dtype='float32'),
          trainable=True)
      b_init = tf.zeros_initializer()
      self.b = tf.Variable(
          initial_value=b_init(shape=(self.units,), dtype='float32'),
          trainable=True)
    def call(self, inputs):  # Defines the computation from inputs to outputs
        return tf.matmul(inputs, self.w) + self.b

W and b are private attributes of a Layer instance, initialized in build() and subsequently accessed only through get_ and set_ methods on the Layer. If it’s a question of style, this to me is the clear preference.

[https://github.com/keras-team/keras/blob/master/keras/engine/base_layer.py]

Thx for that. My issue more concerned the name and nature of the ‘cache’ variable rather than its tuple data type.

As I mentioned, ‘cache’ confusingly puts together several variables from the each layer, whereas in my mind a more logical organization would be separate dicts for each variable. The inclusion of the activation_cache in some places but not others was an additional complication.

So really I was just wondering whether there will be some payoff down the road, perhaps with more complicated models, of structuring the cache by layer rather than by variable? …

The use of cache is a hack that only appears in these early exercises to allow you to move the weights, biases and layer output variables around the helper functions as a single unit. In a few weeks, or certainly as you move forward in this Specialization or into the TensorFlow Advanced Techniques courses, explicit use of cache goes away entirely. As my code fragment illustrates, in a TensorFlow model each layer carries its own weights and biases in separate internal variables. I wouldn’t spend any time on thinking about options for the cache data type. Instead, as @kenb suggests, focus on what W and b represent and why you are messing around with them in the first place.

One last thought/rant before I leave this subject. I say above that I find dict and tuple equally distasteful. Same goes for more than one dict. Or any other structure that separates W and b and all the meaningful context about them. Such as, the inputs. Whether the weights in the layer are trainable or not. The outputs. The transformation operation that turned the inputs into the outputs. The gradients. Which b goes with which W. You need all this context. If you don’t create a data structure that manages those relationships, such as the TensorFlow Layer base class, then you end up doing it yourself in procedural code. Using ugly things like parallel data structures. Or caches.

So +1 @Nathaniel_Light for recognizing the problematic data structure. Keep asking yourself (and the community) whether the things you’re seeing make sense. The rest of us will share our (sometimes strong) opinions. Peace out.

This is helpful. I haven’t reached this point in my DLS journey yet, but good to have some insights on what to focus on in the future topics. Thank you @ai_curious , @kenb for your responses and @Nathaniel_Light for starting the discussion.

1 Like