Body

Much has been said about the ability of large language models (LLMs) to write convincing, natural-sounding text, and how this might change education. To address the role of LLMs in statistics education, however, we must better understand how they write, how their writing compares to expert writing, and how student writing changes as students develop into statistical writers. We discuss quantitative measures of writing style and apply these to a large corpus of parallel LLM- and human-written text, demonstrating that LLMs have distinct (and unnatural) writing styles. We explore the differences in detail in statistical writing, and analyze a corpus of undergraduate student writing collected both before and after the availability of LLMs to show how student writing has changed.

The results suggest that LLMs do not effectively adapt their style to the genre and audience of their writing, and we explore the implications this has for teaching. If students offload the work of writing to a tool that does not write like a data scientist, how can we teach them to communicate like an expert data scientist?