Skip to content
Information Sciences
Machine Learning
20th century
Beginner

Softmax Function

softmax(zi)=ezijezj\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}

Converts logits to probability distribution over classes—sums to 1.

By Various

Information Sciences
Softmax Function
20th century · Various
Why it matters: Standard output layer for multi-class neural networks and transformers.

Discoverers: Various (20th century)

What does it mean?

Converts logits to probability distribution over classes—sums to 1.

Why should I care?

Standard output layer for multi-class neural networks and transformers.

Variables & Units

SymbolNameUnitMeaning
ziz_iLogitUnnormalized score for class i
softmax(zi)softmax(z_i)ProbabilityClass probability

Worked Example

z=[2,1,0.1] → highest class gets ~65% probability.

AI Guide (Pro)

Ask questions about equations and get answers grounded in the Equation Universe catalog.

Share this equation

Equation Universe

Softmax Function

softmax(zi)=ezijezj\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}

Real-world impact

Intelligent systems

Mathematics trains models that reshape work and creativity.

Photo: Unsplash — AI concept

Converts logits to probability distribution over classes—sums to 1.

equation-universe.vercel.app

Post