Andromeda
Note

Gradient Descent

Definition

Gradient Descent is a first-order iterative optimization algorithm used to find a local minimum of a differentiable function. The algorithm takes steps proportional to the negative of the gradient of the function at the current point.

The update rule is defined as: xn+1=xnγf(xn)x_{n+1} = x_n - \gamma \nabla f(x_n)

  • How to read: “The value x n plus one is equal to x n minus gamma times the gradient of f at x n.”
  • Meaning: Take a step opposite to the steepest uphill direction — move downhill toward a minimum.

Why It Matters

Gradient descent is the ‘engine of learning’ for modern AI; it is the mathematical process that allow neural networks to iteratively ‘correct’ their own errors, moving down the landscape of cost functions toward the optimal weights that enable intelligence.

Core Concepts

  • Cost Function: The function f(x)f(x) being minimized, which in machine learning typically represents the error between predictions and actual targets (e.g., Mean Squared Error).
  • Learning Rate (γ\gamma): A hyperparameter that controls the step size. If γ\gamma is too small, convergence is slow; if too large, the algorithm may overshoot the minimum or diverge.
    • How to read: “The parameter gamma.”
    • Meaning: Too small γ\gamma = slow crawl; too large = bouncing past or diverging from the minimum.
  • Convexity: In convex functions, gradient descent is guaranteed to find the global minimum. In non-convex functions (like deep neural networks), it may settle in a local minimum or a saddle point.
  • Stochastic Gradient Descent (SGD): A variant where the gradient is calculated using a single random data point (or a small “mini-batch”) rather than the entire dataset, which accelerates computation and helps escape local minima.
  • Momentum: A technique that adds a fraction of the previous update to the current one, helping the algorithm “roll” through flat regions and noise.

Connected Concepts