Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Goodreads Book Dataset

Amy Fullington
Block #4 Computer Science
★ Goodreads is a social platform utilized by book readers to rate and discuss works of
literature with friends and like-minded followers
★ Over 20,000 books located in the database
★ Allows members to update book progress, rate and review books, join virtual book
clubs, etc.
★ Throughout history, gender discrimination has skewed the predominant gender of
authors in the male direction
★ Genre additionally has an immense impact on a books traits
★ The way a book presents itself determines the audience attracted and reviews
★ Purpose of study was to determine the gender disparities in book writing as well as
the effect certain attributes of books have on the audience and on each other
★ Dataset downloaded to Excel from Kaggle (“goodreads books/author data” by Ben
★ 22,892 books recorded x 20 categories
★ 6 megabytes
★ Genres in the “genre_1” column converted to numbers 0-43
★ Genders in “author_gender” column converted into 0’s (female) and 1’s (male)
★ All other necessary data numerical, so no conversion necessary
★ Save as CSV (Comma Delimited) file in the correct Python project
★ I was drawn to this dataset because I have always been an avid reader and was first
introduced to Goodreads by my English teacher in eighth grade.
Genre Number

Fiction 1

Mystery 2

Romance 3

What genre is most commonly written
by each gender?
Female -
Fantasy, Young

Male - Fiction,
★ Female top genre was romance
○ Writers tend to write about what interests them
○ Female more romantic than male
○ Stereotypes also play a role (women expression)
○ More comfortable with names on romances
★ Male top gender (besides fiction) was fantasy
○ Historical context
○ Gender normals
○ More comfortable writing this genre
○ Money lure

What book genre tends to receive the
highest ratings?
Animals, and
Religion books
receive the
highest average
★ Highest rated genre overall is poetry
○ Reason lies in the genres followers
○ Very specific genre that specific people read
○ Kind people = good reviews
★ Backed up by runner up genres - animals and religion
○ Narrow audience
○ People who read about these subjects tend to enjoy them

What genres books tend to be the
and Historical
novels average
the most pages
★ Longest books are anthologies
○ Average around 600 words
○ Collections of literary works chosen by the creators,
meaning one anthology contains many works.
○ Increases size
★ Historical novels and biographies telling about intricate subjects
○ History = elaborate when considering individual countries
and the world alike
○ Biographies = someone's entire life in one book

Does book-length correlate with book
rating? How about the number of
average rating
or number of
correlate with a
book’s length
★ No correlation for either comparison
○ R-values 0.11209898 and 0.04089959
○ 1 is a perfect positive correlation and -1 is a perfect
negative correlation
○ Books have differing audiences
○ Opinions differ when considering one book

Out of the top 50 rated books, how
many of them were written by male
authors? Female authors?
The split is
even between
female and
male authors
who wrote a
top fifty rated
★ The top fifty rated books have an even split between female and
male authors
○ 24 female (48.0%)
○ 26 male (52.0%)
○ Demonstrates that gender has no direct influence on the
popularity of a book
○ No gender is preferred over the other overall

You might also like