Isn’t the cost of He initialization high, higher than random initialization? Why is that not reflected as one of the downsides of it?
What’s the cost of random initialization? Can you post the graph to compare? It’s something between 0.35 and 0.40.