Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

Toxicology Testing

Cleon Anderson

Colorado State University Global

CSC580: Applying Machine Learning and Neural Networks - Capstone

Dr. Lori Farr

July 09, 2023


Toxicology Testing

The field of toxicology plays a crucial role in assessing the potential risks and hazards

associated with various substances. This assignment implements a toxicology classification model

using TensorFlow and DeepChem packages. The primary objective of this model is to classify

toxicology tests based on the toxicology dataset in the DeepChem package. However, a significant

challenge arises from the imbalanced nature of the dataset. A class imbalance occurs when there is an

unequal distribution of classes within a dataset. The imbalance poses difficulties for machine learning

models, as they may favor the majority class and struggle to classify the minority class effectively.

We address the imbalance by incorporating dropout regularization and weighted accuracy.

Dropout regularization randomly sets a fraction of input units to zero during the training process, which

reduces the network's reliance on specific features and encourages more robust representations. By

incorporating dropout regularization, the toxicology classification model aims to enhance its

generalization capability and mitigate the potential bias towards the majority class.

We use weighted accuracy to account for the class imbalance when evaluating the model's

performance. Weighted accuracy considers the importance of each class by multiplying the predictions

by corresponding weights. This approach ensures that the model's performance metrics accurately

reflect its ability to correctly classify both the majority and minority classes.
Code Analysis

We use TensorFlow for building the toxicology classification model, and the DeepChem library

assists in loading and preprocessing the toxicology dataset. We created a custom Toxicology class that

initializes crucial parameters such as the minibatch size, number of hidden units, learning rate, training

epochs, and batch size. The class loads and prepares the toxicology dataset, including the division into

training, validation, and test sets.

The model architecture consists of an input layer, a hidden layer with rectified linear unit

(ReLU) activation, and an output layer. ReLU activation allows the model to capture non-linear

relationships in the data effectively. Dropout regularization is applied to the hidden layer, promoting

robustness and reducing overfitting. The model's parameters, including weights and biases, are defined

using TensorFlow operations.

We train the model on mini-batches of data over a specified number of epochs. The loss

function, calculated as the sigmoid cross-entropy with logits, measures the discrepancy between

predicted and actual labels. The Adam optimizer updates the model's parameters to minimize the loss.

The code prints the loss values throughout the training process to monitor the model's progress.

We evaluate the model's performance by applying the trained model to the training and

validation datasets, generating predictions. We consider the sample's weights and class imbalance when

calculating the weighted accuracy. The scikit-learn library's accuracy_score function facilitates the

computation of weighted accuracy. The resulting scores provide insights into the model's ability to

classify toxicology tests correctly.


Experiment

The hyperparameters for this experiment are as follows:

 Mini-batch size: 1024

 Hidden layers: 50

 Learning rate: 0.001

 Epochs: 1000

 Batch size: 100

 Dropout probability: 0.5

Figure 1 shows a graphical representation of the loss on the Tensorboard. In both

validation and training, the accuracy is around 50%. When training the model, we set

dropout probabilities to 0.5, but when predicting or evaluating them, we set them to 1.

Fig. 1

Toxicology classification with Tensorboard visualization

In Fig. 2, we see the graph of the model from Tensorboard and Fig. 3, dipicts the loss plot from

Tensorboard.

Fig. 2 Main Graph Fig. 3 Loss Plot


Conclusion

We present a toxicology classification model that addresses the class imbalance challenge using

dropout regularization and weighted accuracy. By leveraging TensorFlow and DeepChem, the model

demonstrates its potential to classify toxicological outcomes. The model illustrates the utility of

dropout probabilities for reducing overfitting. The loss trends in the right direction for each epoch

during training. The main graph is beneficial in visualizing the different components of the model and

how they interrelate

You might also like