Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Assignment 02

By

Zeeshan Asghar

B210317003

Reg.No.2021-UOK-04803

Session:2021-2025

Subject: Artificial Neural Network

Submitted To: Mam Umaira Khurshid

Date of Submission: April 19th,2024

DEPARTMENT OF ARTIFICIAL INTELLIGENCE

FACULTY OF COMPUTING ABD ENGINEERING

UNIVERSITY OF KOTLI AZAD JAMMU AND KASHMIR

1
Table of Contents
1. Hyperparameters ..................................................................................................3

2. Common Hyperparameters ..................................................................................3

3. Role of Hyperparameters .....................................................................................4

4. Advantages and Disadvantages ...........................................................................5

2
1. Hyperparameters
Hyperparameters are parameters whose values control the learning process and determine the
values of model parameters that a learning algorithm ends up learning. The prefix ‘hyper_’
suggests that they are ‘top level’ parameters that control the learning process and the model
parameters that result from it.

Here the prefix "hyper" suggests that the parameters are top level parameters that are used in
controlling the learning process. The value of the Hyperparameters is selected and set by the
machine learning engineer before the learning algorithm begins training the model. Hence, these
are external to the model, and their values cannot be changed during the training process.

2. Common Hyperparameters

Here are the some common hyperparameters that are mostly used

• Learning Rate: This hyperparameter controls the step size at which the model parameters
are updated during training. A higher learning rate can lead to faster convergence, but it
may also cause the model to overshoot the optimal solution or result in instability.
• Batch Size: Batch size determines the number of samples used in each iteration of training.
A larger batch size can lead to faster training, especially on hardware optimized for parallel
processing, but it may also require more memory.
• Number of Epochs: An epoch is one complete pass through the entire training dataset.
The number of epochs specifies how many times the algorithm will iterate over the entire
dataset during training.
• Optimizer Choice: The optimizer determines the update rule for adjusting the model's
parameters during training. Popular optimizers include stochastic gradient descent (SGD),
which updates parameters based on the gradient of the loss function with respect to the
parameters, and more sophisticated optimizers like Adam, RMSprop, or AdaGrad, which
adapt the learning rate for each parameter based on past gradients.
• Dropout Rate: Dropout is a regularization technique commonly used in deep learning to
prevent overfitting. It randomly sets a fraction of input units to zero during training, which
helps prevent the model from relying too heavily on specific features or neurons.

3
• Network Architecture Parameters: These hyperparameters define the structure and
complexity of the neural network, including the number of layers, the number of neurons
in each layer, the activation functions used, and any other architectural choices.
• Regularization: Strength Regularization techniques such as L1 or L2 regularization add
penalty terms to the loss function to discourage overly complex models.

3. Role of Hyperparameters
Here are the roles of these parameters:
• Learning Rate:
Controls the magnitude of parameter updates during training. High learning rates can lead
to faster convergence but risk overshooting the optimal solution or causing instability. Low
learning rates may result in slower convergence but often provide more stable and accurate
models.
• Batch Size:
Determines the number of samples processed in each training iteration. Larger batch sizes
can accelerate training on hardware optimized for parallel processing but may require more
memory. Smaller batch sizes can result in slower training but may offer better
generalization and more frequent parameter updates.
• Number of Epochs:
Dictates how many times the algorithm iterates over the entire dataset during training. Too
few epochs might result in underfitting, while too many epochs can lead to overfitting.
Finding the right number of epochs involves balancing model performance and
computational resources.
• Optimizer Choice:
Defines the update rule for adjusting model parameters during training. Different
optimizers, such as SGD, Adam, RMSprop, etc., have distinct update strategies and may
perform differently based on the task and dataset. The choice of optimizer can influence
training speed, convergence behavior, and final model performance.
• Dropout Rate:
Dropout is a regularization technique that randomly drops a fraction of neurons during
training to prevent overfitting. The dropout rate determines the probability of dropping

4
neurons, with higher rates imposing stronger regularization. Balancing dropout strength is
crucial, as too high a rate may hinder learning, while too low a rate may not provide
sufficient regularization.
• Network Architecture Parameters:
Define the structure and complexity of the neural network. Includes the number of layers,
neurons per layer, activation functions, etc. Properly configuring network architecture is
essential for enabling the model to learn complex relationships within the data while
avoiding overfitting.
• Regularization Strength:
Specifies the weight of penalty terms in the loss function to discourage overly complex
models. Regularization techniques like L1 or L2 regularization help prevent overfitting by
penalizing large parameter values. The regularization strength hyperparameter controls the
tradeoff between fitting the training data well and maintaining model simplicity.

4. Advantages and Disadvantages

These are the advantages and disadvantages of some of the hyperparameters:

• Learning Rate:
➢ Advantages:
Crucial for controlling the speed and stability of convergence during training.
Allows for finetuning of the learning process, balancing between fast convergence
and stable optimization.
➢ Disadvantages:
Selecting an inappropriate learning rate can lead to suboptimal convergence, causing
training instability or slow progress. Finding the right learning rate often requires
manual tuning or the use of learning rate schedulers, which can be time consuming.
• Batch Size
➢ Advantages:

5
Influences the speed of training and memory usage, allowing for optimization based on
available computational resources. Larger batch sizes can exploit parallel processing
capabilities for faster training on suitable hardware.
➢ Disadvantages:
Large batch sizes may lead to poor generalization and hinder the ability of the model
to escape local minima. Smaller batch sizes can slow down training and require more
frequent updates, potentially increasing training time.
• Number of Epochs:
➢ Advantages:
Allows for finetuning the amount of exposure the model has to the training data,
balancing between underfitting and overfitting. Enables monitoring of training
progress and convergence behavior over time.
➢ Disadvantages:
Choosing an inappropriate number of epochs can result in underfitting or overfitting,
requiring careful validation and tuning. Training for too many epochs can lead to
overfitting, wasting computational resources and time.
Each of these hyperparameters plays a critical role in the training process, and finding the right
balance is essential for achieving optimal model performance. Experimentation and careful
tuning are often necessary to determine the best values for each hyperparameter based on the
specific task, dataset, and computational constraints.

You might also like