Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

Saakyan, Arkadiy; Kim, Najoung; Muresan, Smaranda; Chakrabarty, Tuhin

Computer Science > Computation and Language

arXiv:2509.22641 (cs)

[Submitted on 26 Sep 2025 (v1), last revised 3 Mar 2026 (this version, v2)]

Title:Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

Authors:Arkadiy Saakyan, Najoung Kim, Smaranda Muresan, Tuhin Chakrabarty

View PDF HTML (experimental)

Abstract:N-gram novelty is widely used to evaluate language models' ability to generate text outside of their training data. More recently, it has also been adopted as a metric for measuring textual creativity. However, theoretical work on creativity suggests that this approach may be inadequate, as it does not account for creativity's dual nature: novelty (how original the text is) and appropriateness (how sensical and pragmatic it is). We investigate the relationship between this notion of creativity and n-gram novelty through 8,618 expert writer annotations of novelty, pragmaticality, and sensicality via close reading of human- and AI-generated text. We find that while n-gram novelty is positively associated with expert writer-judged creativity, approximately 91% of top-quartile n-gram novel expressions are not judged as creative, cautioning against relying on n-gram novelty alone. Furthermore, unlike in human-written text, higher n-gram novelty in open-source LLMs correlates with lower pragmaticality. In an exploratory study with frontier closed-source models, we additionally confirm that they are less likely to produce creative expressions than humans. Using our dataset, we test whether zero-shot, few-shot, and finetuned models are able to identify expressions perceived as novel by experts (a positive aspect of writing) or non-pragmatic (a negative aspect). Overall, frontier LLMs exhibit performance much higher than random but leave room for improvement, especially struggling to identify non-pragmatic expressions. We further find that LLM-as-a-Judge novelty ratings align with expert writer preferences in an out-of-distribution dataset, more so than an n-gram based metric.

Comments:	ICLR 2026 Camera Ready. 30 pages, 11 figures, 15 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Cite as:	arXiv:2509.22641 [cs.CL]
	(or arXiv:2509.22641v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2509.22641

Submission history

From: Arkadiy Saakyan [view email]
[v1] Fri, 26 Sep 2025 17:59:05 UTC (3,328 KB)
[v2] Tue, 3 Mar 2026 04:38:03 UTC (3,580 KB)

Computer Science > Computation and Language

Title:Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators