It would be a good idea to read through the instructions in this section and maybe the previous section about the function linear_forward. The point here is that linear_activation_function is doing the two steps that are required for forward propagation at a given layer:
- The “linear” activation (W dot A + b). It doesn’t need to do that “by hand”, because you already wrote a function that implements this: linear_forward. Just call it and get the return values.
- Then it invokes the non-linear “activation” function, which is either sigmoid or relu. Here again it looks like you are just writing out those functions, but you should just be calling them.
Then it returns the values it generated, which includes the “cache” entry. It looks like you have the interpretation of the two parts of the cache backwards. The “linear cache” is (A, W, b) and the “activation cache” is (Z). But here again, you don’t have to manually construct those cache values: they are returned to you by linear_forward and by the activation functions (if you use the real definitions instead of trying to do everything over again by hand here).
The other point to make here is that while it sure makes it easy to debug someone’s code when you can just see it, the rules for the course are that we aren’t supposed to be publicly sharing code. The problem is that as soon as one person publishes the solution, then everyone can copy it. So please do me a favor and edit your post to remove the solution source, even if it doesn’t work as written. It’s the principle of the thing …