Professional Documents
Culture Documents
SMD
SMD
The dataset used for this project consists of text messages from various
sources, including Yelp, Amazon, and IMDb. The dataset contains two columns:
"sentence" and "label." The "sentence" column contains the text content of the
messages, and the "label" column indicates whether a message is spam (1) or
not spam (0). The dataset is divided into separate files for each source, and the
code provided reads and combines these files into a single dataframe.