Professional Documents
Culture Documents
Stochastic Gradient Descent (SGD) :: Import As
Stochastic Gradient Descent (SGD) :: Import As
Characteristics:
Iteratively updates the model parameters using the gradients of the loss function with respect to
the parameters. Randomly selects a subset of the training data (mini-batch) for each iteration.
Advantages:
Simplicity and ease of implementation. Can perform well on large-scale datasets. Drawbacks:
Prone to getting stuck in local minima. Can have slow convergence, especially in the presence of
noisy gradients.
Adapts the learning rates of each parameter individually. Divides the learning rate by the root
mean square of the exponentially weighted moving average of squared gradients. Advantages:
Effective in dealing with sparse data and non-stationary objectives. Helps overcome some of the
issues with constant learning rates in SGD. Drawbacks:
May suffer from vanishing or exploding learning rates. Requires tuning of additional
hyperparameters.
Combines the advantages of both RMSprop and momentum. Maintains separate learning rates
for each parameter and an exponentially decaying average of past gradients and squared
gradients. Advantages:
Fast convergence and robustness to noisy gradients. Automatic adjustment of learning rates for
each parameter. Drawbacks:
May exhibit erratic behavior on some non-convex optimization problems. Introduces additional
hyperparameters that need tuning.
fig, ax = plt.subplots()
bar_width = 0.25
index = range(len(optimizers))
ax.set_xlabel('Optimizers')
ax.set_ylabel('Score')
ax.set_title('Comparison of Optimization Algorithms')
ax.set_xticks([i + bar_width for i in index])
ax.set_xticklabels(optimizers)
ax.legend()
plt.show()
import plotly.express as px