Solution to Mathematics for Machine Learning Exercise 7.2

If you find any mistakes, please make a comment! Thank you.

Linearity
Mathematics for Machine Learning
0 Comments

Consider the update equation for stochastic gradient descent (Equation (7.15)). Write down the update when we use a mini-batch size of one.

Solution: Let $a$ be any function from $\mathbb Z^+$ to $\{1,\dots,n\}$. Then $$\theta_{i+1}= \theta_{i}-\gamma_i(\nabla L_{a(i)}(\theta_{i}))^\top.$$The point is using single $L_j$ instead of all $L_{1},\dots,L_n$.

Tags: Gradient Descent

Linearity

This website is supposed to help you study Linear Algebras. Please only read these solutions after thinking about the problems carefully. Do not just copy these solutions.

Close Menu