Information Sciences

Deep Learning

1974/1986

Advanced

Backpropagation Chain Rule

\frac{\partial L}{\partial w_{ij}} = \frac{\partial L}{\partial z_j} \cdot \frac{\partial z_j}{\partial w_{ij}}

Gradients flow backward through the network via the chain rule—enabling deep learning.

By Paul Werbos, Geoffrey Hinton et al.

Information Sciences

1974/1986 · Paul Werbos

Human Reviewed

84%

Rabbit Hole Mode

Five doors into the universe behind this equation. Choose your path.

Story PortalWho discovered this?Discover how Paul Werbos and others shaped this equation.Visual PortalWhat does it look like?See the equation come alive in the Visual Studio.Machine PortalWhere is it used?Explore machines powered by this equation — all deep learning.Math PortalWhat does it derive from?Trace the mathematical lineage from gradient descent.Future PortalWhere is it going?Continual learning

Why it matters: Enabled training of deep neural networks—the deep learning revolution.

Discoverers: Paul Werbos, Geoffrey Hinton et al. (1974/1986)

What does it mean?

Gradients flow backward through the network via the chain rule—enabling deep learning.

Why should I care?

Enabled training of deep neural networks—the deep learning revolution.

Equation Compass

North — Prerequisites

West — History

East — Applications

South — Derivations

Derivation

Variables & Units

Symbol	Name	Unit	Meaning
$L$	Loss	—	Output loss
$w_ij$	Weight	—	Connection weight
$z_j$	Activation	—	Neuron pre-activation

Worked Example

4-layer network: gradients multiply through 4 Jacobian terms.

AI Guide (Pro)

Ask questions about equations and get answers grounded in the Equation Universe catalog.

Upgrade to Pro Try demo (7 days)

Sources & further reading

Search Wikipedia Search Wolfram MathWorld Open in Wolfram|Alpha Watch explainers on YouTube

Equation Universe