In one of the first lectures on Log Regression, Prof Andrew Ng says that cross-entropy function is convex (unlike least squares).
Does it still hold true when we get to multi-layer forward feed networks (with ReLU activations in hidden layers) in Week 4? I.e., does cross-entropy still stay convex with respect to all W^{[l]} and b^{[l]}?