Consider the update equation for stochastic gradient descent (Equation (7.15)). Write down the update when we use a mini-batch size of one.

Solution: Let $a$ be any function from $\mathbb Z^+$ to $\{1,\dots,n\}$. Then $$\theta_{i+1}= \theta_{i}-\gamma_i(\nabla L_{a(i)}(\theta_{i}))^\top.$$The point is using single $L_j$ instead of all $L_{1},\dots,L_n$.