Deepak's blog
Gradient Descent there are three variants of gradient descent,which differ in how much data we use to compute the gradient of objective function Depending on the amount of data , we make a trade-off between the accuracy of the parameter update and the time it takes to perform as update Batch gradient descent: Parameters are updated after computing the gradient of error with respect to the entire training set In code, batch gradient descent looks like this: for i in range(nd_epochs): params_grad = eval uate_gra dient ( loss_function , data , params ) params = params - learning_rate * params_grad Stochastic Gradient Descent: Parameters are updated after computing the gradient of error with respect to a single training example Mini-Batch Gradient Descent: Parameters are updated after computing the gradient of error with respect to a subset of the training set in code this looks like : for i in range ( nb_epochs ):