Professional Documents
Culture Documents
COMP 652 Project Final Paper
COMP 652 Project Final Paper
COMP 652 Project Final Paper
A. Architecture
Fig. 2. This figure shows how the masked sentences are combined together
by adding the [CLS] and [SEP] tokens. This is also done for the masked
labels (it adds zeros at all [CLS] and [SEP] indices and the attention mask
(it adds ones at all [CLS] and [SEP] indices
Fig. 7. This figure summarizes the training of the pretrained Micro BERT on the classification task on the disaster dataset.
the other two partition losses but the accuracies were all close dimension 128, dropout .2, patience of early stopper 10,
leading to very small evidence of over-fitting the dataset. The learning rate 1e-5) to see if pretraining on the books corpus
total train time for this model was 238 minutes and the kept made a difference. The results from this version of Micro
epoch being 141. BERT can be seen in figure 8.
Finally, a non-pretrained Micro Bert (the weights of the The final losses of the non-pretrained Micro BERT were
Micro BERT model were not initialized from the training train .8477, val .9335 and test .9446. The accuracies were
on the google books corpus) was also fit using the exact .7044 train, .6926 validation, and .6947 test. The F1 score for
same MLP head (weights were again not initialized from the validation and test partitions was .677 and .675. The same
previous training) and was fit on the disaster classification trend of small differences between train and val/test losses
dataset (embedding dimension 128, number of attention heads continued with small differences in accuracy being present
8, number of stacked encoders blocks 3, feed forward across partitions (validation and test partitions are about a half
hidden
a percent less accurate). These results show larger signs of
overfitting than the training of the pretrained Micro BERT.
The train time of the non-pretrained Micro BERT was 120
minutes with a kept epcoh of 77.
B. Dataset and Pre-processing
First, the dataset is loaded from a CSV file, skipping the D. DeBERTa
problematic lines, and the rows with lacking values are
deleted to control the data. integrity. Similarly, the 'class_int' The code content does not address the process or use of the
column is transformed to an integer type, and the 'tweet_text' DeBERTa model as it is. DeBERTa is a BERT descendant
rows are converted to strings. One-hot encoding is applied to that owns some improvements, like the attention mechanism
classify categorical data. The program removes URLs, disentanglement and better parameter sharing strategies. If
placeholders, and non-word characters using text you have decided to use the DeBERTa model instead of the
preprocessing words, comments, hashtags, and the text is also Tiny BERT model, you'll have to modify the code to load the
tokenized, stemmed and lemmatized using NLTK which has DeBERTa model and use it for fine-tuning.
been downloaded previously. The preprocessed tweets are
stored in a new column, and the boolean values are replaced E. AlBERT
with numbers, the step table is all done. Data that can then be
used for further analytical or predictive problem-solving. The supplied code is also impartial of the AlBERT model,
same as DeBERTa. AlBERT is like the second version
# Load dataset while ignoring errors in specific lines df =
pd.read_csv("dataset_final.csv", encoding='latin1', on_bad_lines of the BERT model architecture, including training the model
= 'skip') in a faster way and better parameter sharing. If you plan to
df.isnull().sum() employ AlBERT rather than TinyBERT, the code also should
be adjusted to incorporate the AlBERT model and tune the
event 0 tweet_text 0 class_int 0 tweet_text_tokenize 0
tweet_stem 0 tweet_lemma 69 dtype: int64
model.
df = df.dropna(axis=0)
F. TinyBERT
df.isnull().sum()
event 0 tweet_text 0 class_int 0 tweet_text_tokenize 0 This code is based on heavily relying on the TinyBERT
tweet_stem 0 tweet_lemma 0 dtype: int64
model. TinyBERT is a lightweight version of the BERT
df.info() model with the purpose of providing a compact model
<class 'pandas.core.frame.DataFrame'> Index: 74141 entries, 0 to through knowledge distillation techniques that retain the
74209 Data columns (total 6 columns): # Column Non-Null Count same levels of performance on downstream tasks as the
Dtype --- ------ -------------- ----- 0 event 74141 non-null
object 1 tweet_text 74141 non-null object 2 class_int 74141 non- original BERT model. The code loads the pre-trained
null object 3 tweet_text_tokenize 74141 non-null object 4 TinyBERT model which is trained on the disaster tweet
tweet_stem 74141 non-null object dataset. After training, the model is evaluated for text
classification. This code executes the training and evaluation
of a TinyBERT model, timing the entire training process
C. Disaster BERT Model
using Python's time. Time () to capture start and end times.
Training occurs over 10 epochs with a batch size of 32, using
The code cut flow is the development of disaster text
both training and validation datasets. After training, the script
classification model using a version of the BERT model. The
prints a model summary and calculates the total number of
sentence specifically refers to a compressing model of
trainable parameters. It then evaluates the model on the test
TinyBERT that was put together from BERT. This model
set to measure loss and accuracy, also timing this process for
feeds on the data about tweets related to disasters as being
performance analysis. The output includes training and
trained on the dataset and gets customized through various
testing times, the number of epochs, and the total trainable
metrics including accuracy, F1 score, recall, precision, ROC
parameters, offering a comprehensive view of the model's
curves, and AUC scores. The main task is utilized for training
training performance and efficiency.
and fine-tuning the model of the disaster BERT, which is
assigned to categorize tweets into different categories related G. Evaluation
to the disaster. This code processes a dataset of tweets for
machine learning by splitting it into training and testing sets This code below demonstrates comprehensive evaluation and
with a 20% test size for model evaluation. It utilizes a BERT visualization processes for a trained TinyBERT model on a
tokenizer to prepare the data for input into a TinyBERT text classification task. Initially, it plots the training and
model, adapting the model's final layer to output ten classes validation loss and accuracy over epochs, providing visual
for multi-class classification. The model is compiled using insight into the model's learning and generalization
the Adam optimizer and categorical cross-entropy loss. capabilities across training sessions. It then computes
Finally, both the input encodings and labels are converted predicted probabilities and applies a softmax function to
into TensorFlow tensors, setting the stage for model training transform log its into actual probabilities. The script
and evaluation. calculates the ROC curve and the Area Under the Curve
(AUC) for each class, offering a graphical representation of
model performance across different thresholds. Predicted of model is capable of handling satisfactorily the short-text
probabilities are further used to derive predictions, which are classification tasks in disaster incidents which have become a
then compared against the true labels to compute accuracy, real need today. The model's ability to represent contextual
F1 score, recall, and precision, offering a holistic view of the clues and sense of the sentences turned out to be a key ability
model's predictive accuracy. A confusion matrix is plotted to to recognizing the texts into correctly disaster-related
visually assess the model's performance in distinguishing categories.
between different classes. Finally, a detailed classification
report is generated, providing precision, recall, F1-score, and
support for each class, thus summarizing the model’s
performance across various metrics. This series of
evaluations and visualizations helps in understanding the
model's strengths and weaknesses in classifying text into
specific categories.
Fig. 9. This figure shows the results of Confusion Matrix for the Model
IV. DISCUSSION