As for what you were doing wrong, you are not taking into account Functional API programming.
For example, your Z1 was taking into account the input image, but your later layers (A1, P1. Z2 etc ) are not taking into account the previous layer that is giving it the input. For detailed information read section 4.3, especially the bolded “outputs” equation