The role of optimizer in deep neural networks
model impacts the accuracy of the model. Deep learning
comes under the umbrella of parametric approaches;
however, it tries to relax as many as assumptions as
possible. The process of obtaining parameters from the
data is gradient descent. Gradient descent is the chosen
optimizer in neural network and many of the machine
learning algorithms. The classical stochastic gradient
descent (SGD) and SGD with momentum which were
used in deep neural networks had several challenges
which were attempted to resolve using adaptive learning
optimizers. Adaptive learning algorithms likeRMSprop, Adagrad, Adam wherein learning rate for
each parameter is computed were further developments
for better optimizer. Adam optimizer in Deep Neural
Networks is often a default choice observed recently.
Adam optimizer is a combination of RMSprop and
momentum. Though, Adam since its introduction has
gained popularity, there are claims that report
convergence problem with Adam optimizer. Also, it is
advocated that SGD with momentum gives better
performance compared to Adam. This paper presents
comparative analysis of SGD, SGD with momentum,
RMSprop, Adagrad and Adam optimizer on Seattle
weather dataset.The Seattle weather dataset, was
processed assuming Adam optimizer will prove to be the
better optimizer choice as preferred a default choice by
many, however, SGD with momentum proved to be a
unsurpassed optimizer for this particular dataset.
Keywords : Gradient Descent, SGD with momentum RMSprop, Adagrad and Adam.