Automated Textual Analysis in Revision

Mike Widner

Screen shot of Voyeur

I like to discover and play with digital humanities tools. One I recently discovered is Voyeur, which creates word clouds and word frequency graphs for texts you provide it. Despite the warning by Jacob Harris that we should be wary of word clouds, they can serve as a gentle introduction to automated text analysis for students. While experimenting with Voyeur, I wondered how I could best use this tool to expose my students to the ways the digital humanities are transforming how we interact with and study literature. Rather than explore a literary text, however, I decided that it might be interesting to see if textual analysis can help with the process of revision. My hope is that this exercise might make students see the value of such tools in a different way and see their own writing as texts available to (and requiring) interrogation.

Voyeur is an easy tool to use. You simply upload or paste a text and click “Reveal”. You’re then provided with a workbench-like screen that starts with a word cloud and copy of the text you submitted. You can also upload multiple documents in many common file formats such as Microsoft Word or PDF. Doing so allows Voyeur to create a corpus of texts that it can then compare for word frequencies. Voyeur also provides two pre-defined corpora: one of Shakespeare’s plays (which would be fun for a course on his works) and another of a humanist listserv archive. The initial word cloud includes every word in the document, which is often not very useful because of common words like “the,” but the tool provides a pre-programmed list of “stop words” that will cause Voyeur to redraw the word cloud with those words omitted. Clicking on any word in the cloud will provide a graph of the relative frequency of that word. The tool also provides information about vocabulary density, distinctive words, length of documents, and a number of other statistical details worth investigating.

Rather than delve into the possibilities some of the more advanced statistical measures provided, I decided to focus on word clouds and word frequencies when I use this tool with students; those two elements seem the most accessible to students who probably lack experience with automated textual analysis. I also want to focus on seeing if this approach can help address one of the biggest and most prevalent problems I encounter in student writing: the difficulty stating a clear thesis and then staying focused on that thesis throughout the paper. I wondered, then, if a word cloud and graphs of word frequencies might allow a way to visualize the actual (rather than implicit or imagined) topics of a paper and their appearance and disappearance in different sections. I came up with this exercise as a result of these ideas. Note: I teach in a classroom that provides a computer to each student. Still, this exercise could, with only a little effort, be repeated with laptops or outside of class time. Voyeur does allow users to export all data, so it would be fairly simple to share work.

First, have students upload their papers to Voyeur. If they have multiple revisions of a paper, all the better, as it allows a comparison of the iterations of their writing. After setting the stop words and exploring the different word frequencies of their own work, students should then trade and look at a peer’s work. This switch allows the students to avoid being biased by what they think the paper is about and instead focus on what Voyeur shows. Here are a few instructions I have come up with for students to ask:

Use Voyeur to see if you can get an idea of the paper’s thesis and how the argument progresses without reading the paper.

  • What can you determine about the paper's organization?
  • What words are most common? What words would you expect to be most common based on the thesis?
  • What words rise and fall (or do not) in frequency together? Would you expect them to do so?
  • How can you revise your paper so that the most important words to the argument appear more frequently or in more effective combinations?

I have tested this technique on some of my own writing and found that it does, in fact, reveal some interesting patterns. For example, in one essay I wrote, I juggle three major topics, which rise and fall in sequence through the paper. It was surprising to me to see just how regularly the paper followed that pattern, in fact, and confirmed that I had organized it in a logical manner that accords with my argument. What other uses might you suggest for Voyeur? Are there other questions you think I could pose for students as they use this tool to analyze their own writing? Do you know of other tools that might be useful for this exercise?


