Professional Documents
Culture Documents
CSC580_CTA4 _Option_1_Anderson_Cleon
CSC580_CTA4 _Option_1_Anderson_Cleon
Cleon Anderson
The field of toxicology plays a crucial role in assessing the potential risks and hazards
associated with various substances. This assignment implements a toxicology classification model
using TensorFlow and DeepChem packages. The primary objective of this model is to classify
toxicology tests based on the toxicology dataset in the DeepChem package. However, a significant
challenge arises from the imbalanced nature of the dataset. A class imbalance occurs when there is an
unequal distribution of classes within a dataset. The imbalance poses difficulties for machine learning
models, as they may favor the majority class and struggle to classify the minority class effectively.
Dropout regularization randomly sets a fraction of input units to zero during the training process, which
reduces the network's reliance on specific features and encourages more robust representations. By
incorporating dropout regularization, the toxicology classification model aims to enhance its
generalization capability and mitigate the potential bias towards the majority class.
We use weighted accuracy to account for the class imbalance when evaluating the model's
performance. Weighted accuracy considers the importance of each class by multiplying the predictions
by corresponding weights. This approach ensures that the model's performance metrics accurately
reflect its ability to correctly classify both the majority and minority classes.
Code Analysis
We use TensorFlow for building the toxicology classification model, and the DeepChem library
assists in loading and preprocessing the toxicology dataset. We created a custom Toxicology class that
initializes crucial parameters such as the minibatch size, number of hidden units, learning rate, training
epochs, and batch size. The class loads and prepares the toxicology dataset, including the division into
The model architecture consists of an input layer, a hidden layer with rectified linear unit
(ReLU) activation, and an output layer. ReLU activation allows the model to capture non-linear
relationships in the data effectively. Dropout regularization is applied to the hidden layer, promoting
robustness and reducing overfitting. The model's parameters, including weights and biases, are defined
We train the model on mini-batches of data over a specified number of epochs. The loss
function, calculated as the sigmoid cross-entropy with logits, measures the discrepancy between
predicted and actual labels. The Adam optimizer updates the model's parameters to minimize the loss.
The code prints the loss values throughout the training process to monitor the model's progress.
We evaluate the model's performance by applying the trained model to the training and
validation datasets, generating predictions. We consider the sample's weights and class imbalance when
calculating the weighted accuracy. The scikit-learn library's accuracy_score function facilitates the
computation of weighted accuracy. The resulting scores provide insights into the model's ability to
Hidden layers: 50
Epochs: 1000
validation and training, the accuracy is around 50%. When training the model, we set
dropout probabilities to 0.5, but when predicting or evaluating them, we set them to 1.
Fig. 1
In Fig. 2, we see the graph of the model from Tensorboard and Fig. 3, dipicts the loss plot from
Tensorboard.
We present a toxicology classification model that addresses the class imbalance challenge using
dropout regularization and weighted accuracy. By leveraging TensorFlow and DeepChem, the model
demonstrates its potential to classify toxicological outcomes. The model illustrates the utility of
dropout probabilities for reducing overfitting. The loss trends in the right direction for each epoch
during training. The main graph is beneficial in visualizing the different components of the model and