CSC580_CTA6 _Option_1_Anderson_Cleon

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 4

Image Classification using Custom Convolutional Neural Networks

Cleon Anderson

Colorado State University Global

CSC580: Applying Machine Learning and Neural Networks - Capstone

Dr. Lori Farr

July 23, 2023


Image Classification using Custom Convolutional Neural Networks

In computer vision, image classification involves assigning a label from predefined categories

to an input image. Convolution Neural Networks (CNN) provide state-of-the-art performance in Image

classification problems. In this assignment, we explore the implementation of a custom CNN for image

classification on the CIFAR-10 dataset and evaluate the efficacy of the classification model.

Model Architecture and Hyperparameters

Our CNN model consists of two convolutional layers and two fully connected layers. The model uses

the Rectified Linear Unit (ReLU) activation function after each convolutional operation. The

hyperparameters for this model are as follows:

1. Filters: The number of filters in each convolutional layer determines the depth of feature maps.

The first convolutional layer has 32 filters, and the second has 64 filters. A higher filter number

allows the model to learn more complex and abstract features but comes at a higher

computational cost.

2. Kernel Size: Each convolutional operation has a receptive field determined by its kernel size.

The kernel size of both convolutional layers is 5x5. We generate more parameters with larger

kernels but also capture more contextual information.


3. Strides: In convolution, strides define the step size for moving the kernel over the input. The

first convolutional layer uses a stride of (1, 1), and the second uses (2, 2). Strides of (2, 2)

reduce the spatial dimensions, enabling the model to capture larger spatial patterns efficiently.

4. Padding: Padding ensures that the output has the same spatial dimensions as the input. We use

the 'SAME' padding, which pads the input so that the output size matches the input size.

5. Fully Connected Layers: The first fully connected layer has 1024 units, and the second fully

connected layer has ten units, representing the number of classes in the CIFAR-10 dataset.

6. Dropout and Learning Rate Tuning: We use a dropout probability 0f 0.8 on the first fully

connected layer to improve the generalization and reduce overfitting. During training, dropout

sets a fraction of the layer's output units to zero, creating multiple submodels. Dropout prevents

the model from becoming too reliant on specific features and encourages it to learn more robust

representations.

7. Learning rate plays a vital role in training the model effectively. Training at a fixed rate can lead

to slow convergence or unstable training. To solve the slow convergence problem, we

implement a Step Decay Learning Rate Scheduler.

Results

Fig. 1 Tranining and Validation Loss and Accuracy Plots Fig. 2 Algorithm Run with Accuracy Results
The model's efficacy does not improve significantly throughout training.

1. Validation Accuracy and Loss: The validation accuracy remains constant at approximately

0.22, and the validation loss stays at a high value of 272.35 throughout the last epoch,

suggesting that the model's performance on the validation set is not improving and cannot make

accurate predictions on unseen data.

2. Test Accuracy: The test accuracy of 0.1 is significantly lower than the training accuracy. This

discrepancy suggests that the model needs to be more balanced to the training data, as it

performs poorly on the unseen test data.

3. Average Training Accuracy and Loss: The average training accuracy across all epochs is

0.7475, while the average training loss is 1.02. The average accuracy is relatively high,

indicating that the model performs well on the training data. However, the average loss could be

better, suggesting that the model's predictions could be more confident.

4. Average Validation Accuracy and Loss: The average validation accuracy is 0.22, while the

average validation loss is 276.92. These values are consistent, indicating that the model's

performance does not improve when considering the entire training process.

Conclusion

The model's efficacy remains low, and it does not generalize well to unseen data. The model appears to

be overfitting to the training data, as evidenced by the large discrepancy between training and test

accuracy. Despite using dropout and L2 regularization, the model struggles to improve its performance

on the validation and test sets. We implemented hyperparameter tuning, but after many bug fixes, the

process took longer than expected.

You might also like