Experimental Modeling of Writing Styles for Authorship Verification via Punctuation Analysis

Dillon, Roberto, Gotelli, Marco, and Bruzzone, Agostino (2025) Experimental Modeling of Writing Styles for Authorship Verification via Punctuation Analysis. In: Procedia Computer Science (274) pp. 1238-1243. From: MAS 2025: 24th International Conference on Modelling and Applied Simulation, 17-19 September 2025, Fes, Morocco.

[img]
Preview
PDF (Published Version) - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (534kB) | Preview
View at Publisher Website: https://doi.org/10.1016/j.procs.2025.12....
 
1


Abstract

Authorship attribution is a critical task in forensic linguistics, literary studies, and digital forensics, where determining the origin of a text can have significant implications. This paper presents an experimental stylometric tool developed in Python, designed to model writing styles and assist in authorship determination. The tool extracts nine quantitative features from input texts, including metrics such as average words per sentence and the frequency of specific punctuation marks (e.g., commas, semicolons). By comparing these features across texts, the system computes a probability score indicating the likelihood that two samples share the same author. To evaluate the tool’s effectiveness, we conducted experiments using short stories authored by Charles Dickens, Ernest Hemingway, and Edgar Allan Poe. The results demonstrate that the tool can reliably distinguish between authors and identify stylistic consistencies within an author’s body of work. The approach leverages statistical analysis to provide an interpretable and reproducible framework for authorship attribution, complementing more complex machine learning models. This work contributes to the growing field of computational stylometry by offering a transparent, feature-driven method suitable for both forensic and academic applications. Future research will focus on expanding the feature set, testing on larger and more diverse corpora, and integrating the tool with advanced classification algorithms to further enhance accuracy and applicability.

Item ID: 90227
Item Type: Conference Item (Research - E1)
ISSN: 1877-0509
Keywords: punctuation analysis, writing style, attribution
Copyright Information: © 2025 The Authors. Published by ELSEVIER B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Date Deposited: 20 May 2026 02:48
FoR Codes: 46 INFORMATION AND COMPUTING SCIENCES > 4604 Cybersecurity and privacy > 460499 Cybersecurity and privacy not elsewhere classified @ 50%
44 HUMAN SOCIETY > 4402 Criminology > 440299 Criminology not elsewhere classified @ 50%
SEO Codes: 14 DEFENCE > 1401 Defence > 140105 Intelligence, surveillance and space @ 30%
13 CULTURE AND SOCIETY > 1302 Communication > 130202 Languages and linguistics @ 30%
22 INFORMATION AND COMMUNICATION SERVICES > 2204 Information systems, technologies and services > 220402 Applied computing @ 40%
Downloads: Total: 1
More Statistics

Actions (Repository Staff Only)

Item Control Page Item Control Page