Information Sciences
Machine Learning
20th century
BeginnerSoftmax Function
Converts logits to probability distribution over classes—sums to 1.
By Various
Information Sciences
Softmax Function
20th century · Various
Why it matters: Standard output layer for multi-class neural networks and transformers.
Discoverers: Various (20th century)
What does it mean?
Converts logits to probability distribution over classes—sums to 1.
Why should I care?
Standard output layer for multi-class neural networks and transformers.
Variables & Units
| Symbol | Name | Unit | Meaning |
|---|---|---|---|
| Logit | — | Unnormalized score for class i | |
| Probability | — | Class probability |
Worked Example
z=[2,1,0.1] → highest class gets ~65% probability.
AI Guide (Pro)
Ask questions about equations and get answers grounded in the Equation Universe catalog.
Sources & further reading
Share this equation
Equation Universe
Softmax Function
Real-world impact
Intelligent systems
Mathematics trains models that reshape work and creativity.
Photo: Unsplash — AI concept
Converts logits to probability distribution over classes—sums to 1.
equation-universe.vercel.app