Artificial neural networks, inspired by biological neurons, have been shown to be incredibly successful in a variety of tasks. The conventional and widely accepted mechanism through which neural networks learn is backpropagation - where a loss or error is computed at the output, gradient descent is used to minimize the loss, and the chain rule is employed to compute how weights should be updated in the neural network. However, backpropagation is simply not biologically plausible, and often fails to capture natural features of human-like learning.
Here, inspired by the structure and activity of the brain (specifically, the cerebellum), I introduce a novel biologically plausible learning rule which not only changes the way learning occurs in machines, making it significantly faster and more efficient than backpropagation, but also leads to the emergence of several useful characteristics during learning. In particular, I present a mechanism for learning through local learning combined with gating. Local learning by itself is unable to learn non-linear functions, however this lost expressive power can be regained through an input-dependent gating mechanism.
I show why this learning mechanism is successful in a variety of complex tasks - it predicts chaotic time series (which are, naturally, nonlinear), it learns certain behavioral tasks that the cerebellum is known to be involved in such as vestibulo-ocular response, as well as learns ML benchmarks such as permuted MNIST and robotic tasks to predict joint movement.
It is significantly more efficient and biological than learning via backprop. Unlike in conventional neural networks, the learned weight curves are smooth in this network, indicating a more intuitive, and perhaps more interpretable, learning mechanism. Additionally, when trained to learn a sequence of tasks, these networks are significantly better at remembering 'old' tasks, which humans do naturally, but networks trained with backpropagation are worse at. Lastly, I discuss the explicit biological plausibility of this model and present experimental support from imaging in rats. This simple but radically different learning mechanism can be adapted to various types of neural networks including graph neural networks, where it may prevent the common problem of oversmoothing. Thus, it introduces open questions at the interface of learning theory and explainable AI.