The discovery that efficient use of GPUs and computing power in general was so important made people examine long-held assumptions and ask questions that should have perhaps been asked long ago - namely, why exactly does backpropagation not work well.

The insight to ask why the old approaches did not work, rather than why the new approaches did, led Xavier Glort and Yoshua Bengio to write "Understanding the difficulty of training deep feedforward neural networks" in 2010 The particular non-linear activation function chosen for neurons in a neural net makes a big impact on performance, and the one often used by default is not a good choice.

It was not so much choosing random weights that was problematic, as choosing random weights without consideration for which layer the weights are for.

