Professional Documents
Culture Documents
Readme
Readme
ORIGIN
The Amazon reviews dataset consists of reviews from amazon. The data span a period
of 18 years, including ~35 million reviews up to March 2013. Reviews include
product and user information, ratings, and a plaintext review. For more
information, please refer to the following paper: J. McAuley and J. Leskovec.
Hidden factors and hidden topics: understanding rating dimensions with review text.
RecSys, 2013.
DESCRIPTION
The Amazon reviews full score dataset is constructed by randomly taking 600,000
training samples and 130,000 testing samples for each review score from 1 to 5. In
total there are 3,000,000 trainig samples and 650,000 testing samples.
The files train.csv and test.csv contain all the training samples as comma-sparated
values. There are 3 columns in them, corresponding to class index (1 to 5), review
title and review text. The review title and text are escaped using double quotes
("), and any internal double quote is escaped by 2 double quotes (""). New lines
are escaped by a backslash followed with an "n" character, that is "\n".