CSC580_CTA4 _Option_1_Anderson_Cleon

Toxicology Testing
Cleon Anderson
Colorado State University Global
CSC580: Applying Machine Learning and Neural Networks - Capstone
Dr. Lori Farr
July 09, 2023

Toxicology Testing
The field of toxicology plays a crucial role in assessing the potential risks and hazards
associated with various substances. This assignment implements a toxicology classification model
using TensorFlow and DeepChem packages. The primary objective of this model is to classify
toxicology tests based on the toxicology dataset in the DeepChem package. However, a significant
challenge arises from the imbalanced nature of the dataset. A class imbalance occurs when there is an
unequal distribution of classes within a dataset. The imbalance poses difficulties for machine learning
models, as they may favor the majority class and struggle to classify the minority class effectively.
We address the imbalance by incorporating dropout regularization and weighted accuracy.
Dropout regularization randomly sets a fraction of input units to zero during the training process, which
reduces the network's reliance on specific features and encourages more robust representations. By
incorporating dropout regularization, the toxicology classification model aims to enhance its
generalization capability and mitigate the potential bias towards the majority class.
We use weighted accuracy to account for the class imbalance when evaluating the model's
performance. Weighted accuracy considers the importance of each class by multiplying the predictions
by corresponding weights. This approach ensures that the model's performance metrics accurately
reflect its ability to correctly classify both the majority and minority classes.
Code Analysis
We use TensorFlow for building the toxicology classification model, and the DeepChem library
assists in loading and preprocessing the toxicology dataset. We created a custom Toxicology class that
initializes crucial parameters such as the minibatch size, number of hidden units, learning rate, training
epochs, and batch size. The class loads and prepares the toxicology dataset, including the division into
training, validation, and test sets.
The model architecture consists of an input layer, a hidden layer with rectified linear unit
(ReLU) activation, and an output layer. ReLU activation allows the model to capture non-linear
relationships in the data effectively. Dropout regularization is applied to the hidden layer, promoting
robustness and reducing overfitting. The model's parameters, including weights and biases, are defined
using TensorFlow operations.
We train the model on mini-batches of data over a specified number of epochs. The loss
function, calculated as the sigmoid cross-entropy with logits, measures the discrepancy between
predicted and actual labels. The Adam optimizer updates the model's parameters to minimize the loss.
The code prints the loss values throughout the training process to monitor the model's progress.
We evaluate the model's performance by applying the trained model to the training and
validation datasets, generating predictions. We consider the sample's weights and class imbalance when
calculating the weighted accuracy. The scikit-learn library's accuracy_score function facilitates the
computation of weighted accuracy. The resulting scores provide insights into the model's ability to
classify toxicology tests correctly.

Experiment
The hyperparameters for this experiment are as follows:
 Mini-batch size: 1024
 Hidden layers: 50
 Learning rate: 0.001
 Epochs: 1000
 Batch size: 100
 Dropout probability: 0.5
Figure 1 shows a graphical representation of the loss on the Tensorboard. In both
validation and training, the accuracy is around 50%. When training the model, we set
dropout probabilities to 0.5, but when predicting or evaluating them, we set them to 1.
Fig. 1
Toxicology classification with Tensorboard visualization
In Fig. 2, we see the graph of the model from Tensorboard and Fig. 3, dipicts the loss plot from
Tensorboard.
Fig. 2 Main Graph Fig. 3 Loss Plot

Conclusion
We present a toxicology classification model that addresses the class imbalance challenge using
dropout regularization and weighted accuracy. By leveraging TensorFlow and DeepChem, the model
demonstrates its potential to classify toxicological outcomes. The model illustrates the utility of
dropout probabilities for reducing overfitting. The loss trends in the right direction for each epoch
during training. The main graph is beneficial in visualizing the different components of the model and
how they interrelate

CSC580_CTA4 _Option_1_Anderson_Cleon

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSC580_CTA4 _Option_1_Anderson_Cleon

Uploaded by

Copyright:

Available Formats

Toxicology Testing

Colorado State University Global

CSC580: Applying Machine Learning and Neural Networks - Capstone

Dr. Lori Farr

July 09, 2023

We address the imbalance by incorporating dropout regularization and weighted accuracy.

training, validation, and test sets.

using TensorFlow operations.

classify toxicology tests correctly.

The hyperparameters for this experiment are as follows:

 Mini-batch size: 1024

 Learning rate: 0.001

 Batch size: 100

 Dropout probability: 0.5

Figure 1 shows a graphical representation of the loss on the Tensorboard. In both

Toxicology classification with Tensorboard visualization

Fig. 2 Main Graph Fig. 3 Loss Plot

how they interrelate

You might also like