Optimisation in Machine Learning: Convex vs Non-Convex Functions

🎯 Goal:

To help you understand the difference between convex and non-convex functions — and why they matter when training machine learning models.


🧠 1️⃣ What Is Optimisation in ML?

At its core, machine learning is about optimising.

You have a loss function (how wrong your model is), and you want to find the best parameters (like weights) that minimise it.

Optimisation = finding the lowest point in a landscape of errors.

This landscape is shaped by the loss function.


📈 2️⃣ What’s a Convex Function?

A convex function looks like a nice, smooth bowl.
It has only one global minimum — no traps, no surprises.

✅ Why it’s awesome:

  • Any optimisation algorithm (like gradient descent) will reliably find the bottom.
  • You can be confident you’ve found the best possible solution.

Example in ML:
Linear regression has a convex loss surface — which is why it’s easy to optimise.


🌀 3️⃣ What’s a Non-Convex Function?

A non-convex function looks like a mountain range — full of peaks, valleys, ridges, and flat areas.

😅 Why it’s tricky:

  • Multiple local minima (not the best solution, but “good enough”)
  • Gradient descent might get stuck in the wrong valley
  • Optimisers need extra tricks (momentum, Adam, etc.) to navigate

Example in ML:
Deep neural networks have highly non-convex loss surfaces — which is why training them is hard and requires powerful strategies.


⚖️ 4️⃣ Why This Matters in ML

ConceptConvexNon-Convex
ShapeSimple bowlBumpy, with multiple valleys
MinimaOnly one (global)Many (local and global)
Optimisation DifficultyEasy (guaranteed solution)Hard (can get stuck or lost)
ExampleLinear/logistic regressionDeep neural networks, CNNs, LSTMs

🔁 5️⃣ So What Do We Do with Non-Convexity?

We use:

  • Good initialisation (e.g., Xavier, He)
  • Smart optimisers (Adam, RMSProp)
  • Stochasticity (like in mini-batch SGD)
  • Regularisation to smooth the loss surface

These help us find “good enough” minima that work well in practice — even if they’re not perfect.


✨ Final Thought:

Convex optimisation is like hiking in a bowl.
Non-convex optimisation is like hiking blindfolded through the Alps. 🏔️

Understanding this helps you appreciate why some models train easily, while others require careful tuning, patience, and the right tools.

Feedback Display

Learner Reviews