Week 4 Triplet loss

No, it is just plain math. For example,

\newcommand{\dv}[2]{\frac{\mathrm{d} #1}{\mathrm{d} #2}}
J = \sum_{i = 1}^m \max(0, w x_i)
\max(0, w x_i) = \begin{cases} w x_i & \text{ if } w x_i > 0 \\ 0 & \text{ if } w x_i \le 0 \end{cases}
\dv{\max(0, w x_i)}{w} = \begin{cases} x_i & \text{ if } w x_i > 0 \\ 0 & \text{ if } w x_i \le 0 \end{cases}
\dv{J}{w} = \sum_{i = 1}^m \dv{\max(0, w x_i)}{w} = \sum_{i = 1}^m \begin{cases} x_i & \text{ if } w x_i > 0 \\ 0 & \text{ if } w x_i \le 0 \end{cases}

No signal will affect the parameter w for training examples where we already have w x_i \le 0, i.e., the training algorithm only focuses on the more difficult pairs where we still have w x_i > 0. We can think of the other training examples as discarded during this weight update. We are not trying to improve on training examples where we already have w x_i \le 0, i.e.,