Information Sciences

Machine Learning

1847/modern

Intermediate

Gradient Descent Update Rule

\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t)

Update parameters by stepping opposite to the gradient of the loss—learning by hill descent.

By Augustin-Louis Cauchy, Various

Information Sciences

1847/modern · Augustin-Louis Cauchy

Source Verified

89%

Rabbit Hole Mode

Five doors into the universe behind this equation. Choose your path.

Story PortalWho discovered this?Discover how Cauchy, modern ML pioneers and others shaped this equation.Visual PortalWhat does it look like?See the equation come alive in the Visual Studio.Machine PortalWhere is it used?Explore machines powered by this equation — GPUs training neural networks.Math PortalWhat does it derive from?Trace the mathematical lineage from partial derivatives.Future PortalWhere is it going?Second-order and adaptive optimizers

Launch collection — full story, visual, sound & share card

Why it matters: The engine behind virtually all deep learning training.

Discoverers: Augustin-Louis Cauchy, Various (1847/modern)

What does it mean?

Update parameters by stepping opposite to the gradient of the loss—learning by hill descent.

Why should I care?

The engine behind virtually all deep learning training.

Equation Compass

North — Prerequisites

West — History

East — Applications

South — Derivations

Derivation

Variables & Units

Symbol	Name	Unit	Meaning
$θ$	Parameters	—	Model weights
$η$	Learning rate	—	Step size
$L$	Loss	—	Objective function
$∇$	Gradient	—	Direction of steepest ascent

Worked Example

η too large → divergence; too small → slow convergence.

AI Guide (Pro)

Ask questions about equations and get answers grounded in the Equation Universe catalog.

Upgrade to Pro Try demo (7 days)

Continue your trail

Neural Network Artificial Intelligence Robotics AI Engineer

Sources & further reading

Search Wikipedia Search Wolfram MathWorld Open in Wolfram|Alpha Watch explainers on YouTube

Equation Universe