The Tag Genome Dataset for Books

Denis Kotkov, Alan Medlar, Alexandr Maslov, Umesh Raj Satyal, Mats Neovius, Dorota Glowacka

Research output: Chapter in Book/Conference proceedingConference contributionScientificpeer-review

2 Citations (Scopus)


Attaching tags to items, such as books or movies, is found in many online systems. While a majority of these systems use binary tags, continuous item-tag relevance scores, such as those in tag genome, offer richer descriptions of item content. For example, tag genome for movies assigns the tag “gangster” to the movie “The Godfather (1972)” with a score of 0.93 on a scale of 0 to 1. Tag genome has received considerable attention in recommender systems research and has been used in a wide variety of studies, from investigating the effects of recommender systems on users to generating ideas for movies that appeal to certain user groups.

In this paper, we present tag genome for books, a dataset containing book-tag relevance scores, where a significant number of tags overlap with those from tag genome for movies. To generate our dataset, we designed a survey based on popular books and tags from the Goodreads dataset. In our survey, we asked users to provide ratings for how well tags applied to books. We generated book-tag relevance scores based on user ratings along with features from the Goodreads dataset. In addition to being used to create book recommender systems, tag genome for books can be combined with the tag genome for movies to tackle cross-domain problems, such as recommending books based on movie preferences.
Original languageEnglish
Title of host publication Proceedings of the 2022 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’22)
Subtitle of host publicationMarch 14–18, 2022, Regensburg, Germany
Place of PublicationNew York
ISBN (Print)978-1-4503-9186-3
Publication statusPublished - 2022
MoE publication typeA4 Article in a conference publication
EventACM SIGIR Conference on Human Information Interaction and Retrieval: CHIIR -
Duration: 14 Mar 2022 → …


ConferenceACM SIGIR Conference on Human Information Interaction and Retrieval
Period14/03/22 → …


Dive into the research topics of 'The Tag Genome Dataset for Books'. Together they form a unique fingerprint.

Cite this