The Lab for Cultural Criticism uses advanced computational techniques to study culture. We develop tools, methods, and theory-informed critiques for the digital humanities. Building on years of work in high-performance scientific computing, our lab takes advanced tools and applies them to cultural texts from literary classics to the code that makes up popular software packages. We continually ask difficult methodological questions in order to help define the type and scope of projects undertaken by those working within the digital humanities.
James Dobson’s recent publication, “Can an Algorithm be Disturbed: Machine Learning, Intrinsic Criticism, and the Digital Humanities” (College Literature, Fall 2015) makes the case that the use of machine learning within the digital humanities is part of a wider movement that nostalgically seeks to return literary criticism to the structuralist era, to a moment characterized by belief in systems, structure, and the transparency of language. The essay argues that the scientific criticism of the present attempts to separate methodology from interpretation and in the process it has deemphasized the degree to which methodology also participates in interpretation. It returns to the deconstructive critique of structuralism in order to highlight the ways in which numerous interpretive decisions are suppressed in the pre-processing of text and in the use of machine learning algorithms. This essay the first part of a new book manuscript tentatively titled “How Literature Became a Problem” that addresses the intersection of science, technology, and literature from the nineteenth century to the present.
Using new tools that take advantage of text mining and machine learning technologies (NeMLA 2016 Talk) in the undergraduate classroom does not have to be difficult. I wrote a simple Python wrapper around the Natural Language Toolkit (NLTK) and the popular sklearn package that helped students start making discoveries within minutes. I have used this tool in a first-year writing intensive course at Dartmouth (“Campus Life” in the Fall 2015 term) with some success. Before our class meeting I had students read a chapter (“Theme”) from Matthew Jocker’s Macroanalysis that provided an overview of several popular methods of text mining for the humanities. Students were then able to work through a simple workflow during our normal class period that mirrored several of the steps found in our reading. We used these tools on an archive of approximately eighteen thousand YikYak posts (or “Yaks”) from Dartmouth. If you are interested in this tool or my data, please contact me!
In 2009, as an early experiment in distant reading, I developed a set of tools to assist my students in the close reading of texts by locating interesting groups of synonyms. A small script then used these words to produce an interactive (using GraphViz and an image map) visualization that allowed students to explore these groups by clicking on individual words. In 2010 I used high-performance computing resources at Indiana University to map the relations between the largest synonym groups by using multi-dimensional scaling with a collection of 2,000 digitized texts. I presented some results from this on a panel at the inaugural 2010 C19: Nineteenth-Century Americanists Conference at Penn State University.
For over a decade, I have researched methods and workflows for the management of scientific pipelines. The ability to schedule and optimize complex, multistage operations while enabling reproducible science has motivated my longstanding interest in this work. I started working on these problems through my concern with the difficulties in using and running data mining techniques through our archival project at the National fMRI Data Center (2001 - 2005) and with our local resource, the Dartmouth Brain Imaging Center. This led to collaborative projects with the University of Chicago and Monash University and several journal articles and a book chapter. With rapidly development of new methodologies for the digital humanities using multistage processing pipelines for machine learning and text mining, I have begun applying my earlier work on pipelines to this new field.
With my graduate student (MALS) Amy Hunt, we have recently started a project to use a collection of cultural studies approaches, what many are now calling “critical code studies,” to examine the relation between two very different forms of writing that appear within the “body” of the same text: code and its documentation. Amy is working on completing her thesis and has recently given a presentation on her work thus far.