Scholars' Lab

Syndicate content
Works in Progress
Updated: 9 hours 54 min ago

Transcription Is Complicated

Wed, 21/02/2018 - 21:15

In a recent PMLA issue on digital methods, Johanna Drucker concludes her article “Why Distant Reading Isn’t” by claiming that distant reading’s

literalness makes it the closest form of reading imaginable. What distant reading lacks is distance. That distance is critical; it is the space between the literal text and the virtual text, between the inscriptional, notational surface and the rhetorical, cognitive effect that produces a text. (633)

In other words, when an algorithm “reads” a corpus by scouring it for patterns of one kind or another, it doesn’t transform the text the way that human a reader does. It can get so “close” because it reads without the powerful and dynamic cognitive filters through which human readers conjure, out of the written word, literary worlds. For Drucker, closing the gap between “reader” and text in this way is one of the things that makes distant reading “the closest form of reading imaginable.”

But, crucially, human decisions shape how a program closes that gap in the first place. As Drucker argues elsewhere in the article, “modeling and paramaterization”—decisions made by scholars and programmers as to what a program will look for and, therefore, be able to find—not only “shape the terms by which a text is analyzed to produce quantitative data,” but are also “rendered almost invisible by the forms in which results are expressed” (632). These before-the-fact decisions, then, are what allow an algorithm to read from such a close range—ignoring the “rhetorical, cognitive effect that produces a text,” they engage with “the inscriptional, notational surface” according to a set of pre-established instructions to produce results of one form or another. In this sense, some might argue that the “distance” distant reading “lacks” is the gap in which literature happens: the unpredictable, unwieldy interpretive space in which a reader transforms text on a page or screen into a living work of art.


As I assemble my corpus of poetry from the Black Arts Movement, I’ve grown more interested in this gap between “inscriptional, notational surface” and “rhetorical, cognitive effect.” In the past three weeks I have transcribed approximately twenty books of poetry. This is, in many ways, the kind of “reading” that we expect a machine to be good at: tedious and time-consuming, sure, but also mechanical, even mindless—something lacking that human “distance” Drucker describes above.

When it comes to transcription, however, the devil is in the details. And anyone familiar with using OCR software to transcribe text from images knows that machines still struggle to get all the details right. After scanning pages into images and processing them with a program like ABBYY FineReader, the resulting text files are often garbled with mistakes—errors that require a human reviewer to identify, compare with the original, and correct by hand. Though an extremely useful piece of software, a program can’t be all things to all people, and I found this especially true for experimental texts like the poetry in my corpus that employ unusual indentation, spacing, punctuation, capitalization, and non-traditional spellings.

But I already knew that ABBYY FineReader would have trouble transcribing text from images from my corpus. That’s one of the reasons I decided to transcribe them by hand in the first place. What I didn’t anticipate was how much trouble I—a presumably well-trained human reader—would have transcribing text from physical documents into a text editor. This being the case even when my documents were fully intact and the text completely legible.

Over the course of the past few weeks, I found that this hairs-breadth, closest-form-of-reading-imaginable reading—the kind that seems to go no further than inscriptional surface—is also a complex task requiring creativity, imagination, and resourcefulness. Moreover, rather than being a mindless or merely mechanical task, the transcription of these texts frequently presented thorny decisions that demanded my judgement as a reader, scholar, and programmer. Arriving at these decisions often required not only a knowledge of digital methods, but also of bibliographical methods, questions of poetic form, and more practical project management skills.


Take, for example, lines from “a/coltrane/poem,” the final poem from Sonia Sanchez’s 1970 collection We A BaddDDD People (and a poem that got me listening to Coltrane’s music while transcribing):

         (soft       rise up blk/people.  rise up blk/people
         chant)   RISE.  &  BE.  what u can.
                         MUST BE.BE.BE.BE.BE.BE.BE.BE-E-E-E-E-
                                        yeh. john coltrane.
         my favorite things is u.

Like many of the poems from We A BaddDDD People, “a/coltrane/poem” makes dramatic use of indentation, punctuation, the spaces between words, and the spaces between lines. Even transcribing these lines to be published here on the Scholars’ Lab WordPress site, however, raises a number of technical and practical issues. For example, there is no easy way to produce this kind of whitespace in HTML. When web browsers parse the whitespace in poetry—indentation, tabs, etc.—they more or less get rid of it. While investigating the poetry of Mina Loy, Andrew Pilsch argues in his chapter in Reading Modernism with Machines that “the nature of HTML resists—even prevents—the easy introduction of … typographic experimentation” (245)—something he discusses earlier on his blog. Like Pilsch, I ended up having to make use of the “&nbsp” space—something Pilsch discusses more in-depth—to shoehorn spaces into the poem so it would appear correctly, I hope, in web browsers. This means that, in HTML, the above section of poetry looks like this:

In other words, a complete mess. But before trying to print parts of this poem in HTML through WordPress, at an even more basic level I had to get it into a text editor, a process which also raised a number of questions requiring practical decisions. As I type out the above lines into Atom, I have to ask: how many spaces should separate the words that seem to be a stage direction on the left— “(soft / … / chant)”—from the words on the right?

In an ideal world, I would have access to all materials used by Dudley Randall’s Broadside Press to publish this 1970 edition, as well as publication materials from all subsequent editions. Comparing these various documents, I would be able to get a better sense of the typographical materials and units of measurement used to represent Sanchez’s poem on paper. This would provide me with a more holistic sense of how to represent Sanchez’s poem in my text editor. However, given constraints on my time and resources as a Ph.D. student, as well as the size of my corpus, deciding how deep I want to dig in the archive to answer such questions requires serious consideration. Moreover, as far as I can tell, while there were printings of this edition of We A BaddDDD People as late as 1973, there were no other new editions of the work—so the edition I have is the only one I have to work with.

So when faced with the question—how many spaces should separate these words in a text file?—I looked at how far a space gets me in relation to other characters, gauged this against the kinds of spaces in poems elsewhere in the book, and made an educated guess: three after “(soft”, and one after “chant)”. The same goes for the space between “&  BE.”, which is slightly larger than the gaps separating most other words. I’m not sure exactly how much larger this gap is, so I make another educated guess, giving it two spaces instead of one.

In a multiple-page poem defined by such visual experimentation, however, trying to measure and align every word, space, and line break so that the text in my text editor resembles the text on the page—even roughly—is a real challenge. In some cases, given the functionalities of the editor I’m working with, this challenge becomes an impasse. Even in the example above: the space separating the line “yeh. john coltrane.” from the preceding line—“BE-E-E-E-E-E-”—matches the size of other line breaks within stanzas in this volume. But the space separating this line from its succeeding line—“my favorite things is u.”—is both larger than line breaks within stanzas and smaller than breaks indicating new stanzas. While transcribing, I normally represent adjacent lines in a poem with adjacent lines in my text editor; I represent stanza breaks with an empty line. How do I represent in my text editor a line break that is effectively 1.5 times the size of a normal line break? Without reworking my entire spacing system across all of my poems, I can’t—so I decided to transcribe them as adjacent lines despite the clearly visible difference on the page.

Textual Scholarship

The nature of these challenges would come as no surprise to scholars—like Drucker—interested in textual study, bibliographical study, and scholarly editing. Having had the great fortune of taking a seminar here at UVA on textual criticism and scholarly editing with David Vander Meulen, a course at the Rare Book School on the book in the American industrial era with Michael Winship, as well as many thoughtful conversations with friend, colleague, and textual scholar James Ascher, I’ve had the opportunity to adopt many of these methodological lenses as my own. These lenses help us to ask questions like: what exactly, is a literary work? Is Sanchez’s We A BaddDDD People the words printed in ink on the pages of the physical book I’m holding? If there are discrepancies between this book and later editions, how do we reconcile them? And, more relevant to my current project, how does the digital copy of this work in my text editor differ from the bound copy held at UVA’s library from which I’m making my transcription?

In considering these questions, I find helpful the vocabulary used by textual scholar G.T. Tanselle that distinguishes between document, text, and work. To offer a greatly reduced shorthand for Tanselle’s nuanced thinking on these distinctions: there are texts of works and there are texts of documents. Texts of documents refer to the words, markings, or inscriptions on a physical object that is completely unique though it may seem to be identical to other artifacts. Texts of works, on the other hand, are slightly more complicated—they consider the words as instructions for performing that intangible thing that is a verbal literary work in the minds of readers.

Though they may seem abstract, conceptual distinctions such as these have emerged from some of the most concrete, hands-on, rubber-meets-the-road scholarship in literary thought—for example, the kind of thinking that goes into examining multiple versions of a work (like King Lear) so as to produce a single scholarly edition. A distinction like Tanselle’s between texts of documents and texts of works offers a guiding light for scholar down in the often bewildering weeds of a given archive. As Tanselle argues in “Textual Criticism and Deconstruction,”

The distinction between the texts of documents (handwritten or printed, private or published) and the texts of works is basic to textual criticism. The effort to “reconstruct” or “establish” the texts of works presupposes that the texts we find in documents cannot automatically be equated with the texts of the works that those documents claim to be conveying. (1)

In other words, scholars must exercise a great deal of judgement as they try to reconcile meaningful—and sometimes extremely significant—discrepancies between versions of a given physical text as found in physical documents in their efforts to determine the text of the work itself. The role that “intentions” play in all this— as in the words that were meant to be put down—and how best to account for the mediating forces and actors at work in the publication of a book, is a point of debate in textual scholarship, often dependent on the kinds of research questions one hopes to investigate (for more reading here, see D F. McKenzie’s Bibliography and the Sociology of Texts, Jerome McGann’s The Textual Condition, and Tanselle’s “Textual Criticism and Literary Sociology”). And as many scholars have argued, these conceptual distinctions central to textual criticism and thought extend to digital artifacts as well—see, for example, Matthew Kirschenbaum’s “.txtual condition.” Scholarship such as this helps me to think through how a hand-typed .txt file of We A BaddDDD People relates to a physical codex made of paper and ink.

Stanza Breaks

Again, part of the purpose of this post is to expand on just how complicated transcription can be when it comes to performing text analysis on a literary corpus. Moreover, I’m hoping to think through how these practices are bound up with traditional bibliographical lines of inquiry. In short, I’m hoping here to offer further examples of how reading a literary text at extremely close range—Drucker’s “inscriptional, notational surface”—involves all kinds of human thought and judgement. Even if this thought and judgement are hidden in things we might take for granted—like the distinction between thinking of the book I’m holding as being Sonia Sanchez’s We A BaddDDD People, as opposed to a unique physical document inscribed with a text that intends to convey We A BaddDDD People.

So I want to offer a couple more examples of typographical concerns that came up during my transcription process. Unlike extra spaces between words in a line, these issues also more directly impact the kinds of results my analysis aims to produce, as they impact what “counts” as a line or stanza in my model.

The first has to do with stanza breaks. In my day-to-day reading practice, identifying a stanza break usually feels straightforward: lines grouped together in a poem, probably separated by white space. Digging a little deeper, The Princeton Encyclopedia of Poetry & Poetics begins its entry by defining a stanza as “a unit of poetic lines organized by a specific principle or set of principles” (1358). Likewise, The Oxford Dictionary of Literary Terms defines a stanza first and foremost as

A group of verse lines forming a section of a poem and sharing the same structure as all or some of the other sections of the same poem, in terms of the lengths of its lines, its metre, and usually its rhyme scheme. In printed poems, stanzas are separated by spaces.

While this definition doesn’t help us much with something like Sanchez’s “a/coltrane/poem”—a poem that more or less flies in the face of traditional stanzaic form—it does seem like it would help us if we wanted to make a “stanza” a parameter in our analytical models, or even in figuring out how best to separate lines and stanzas in our text files.

But even in more traditionally stanzaic poems—of which there are many in my corpus—deciding what “counts” as a stanza can get messy. Something as simple as page breaks, for instance, can wreak havoc in making such decisions. This is particularly the case when only one edition of a work exists, and one doesn’t have access to original manuscripts.

Consider, for example, a poem titled “Malcolm Spoke/  who listened?” from Haki R. Madhubuti’s 1969 collection Don’t Cry, Scream, published with Broadside Press. The poem is stanzaic, and distinguishes stanzas with what seem to me like normal breaks. These groupings, however, have no regular rhyme scheme, no regular use of capitalization, no regular number of lines, no tight thematic or narrative structure (i.e. a point of view that alternates from stanza to stanza), and no regular pattern in punctuation (i.e. some stanzas conclude with no punctuation while some conclude with a period). And, crucially, the poem extends partway onto a second page. These are the two groups of lines on either side of the page break:

animals come in all colors.
dark meat will roast as fast as whi-te meat
especially in
the unitedstatesofamerica’s
self-cleaning ovens.

For a few reasons, I decided to transcribe these two sections as a single stanza. First, at a more visual, design level, the poem has no other stanzas as short as two lines. The book as a whole, in fact, has very few two-line stanzas, and while there are a few single unattached lines, they usually come right at the end of a poem. In comparison with the rest of the poem and the other poems in the collection, then, it seemed more likely to be a larger stanza than not.

More convincingly, however, my feeling that these two chunks are one unit comes from the poem itself—the group of lines above seems, to me, to develop a coherent line of poetic thought. The first two lines introduce the metaphor of meat of “all colors” roasting, and the following line (after the page break) intensifies this imagery by locating this metaphor in the United States and its “new /self-cleaning ovens.” The lines after the page break make most grammatical and metaphorical sense when taken as part and parcel of the lines prior to the page break.

This is not to say that other poems in this volume don’t break up grammatical expressions across stanzas—they definitely do. Other poems in this volume also develop specific metaphors or images over the course of several stanzas. But with this poem in particular, stanzas seem to be doing something else. Each has a kind of conceptual focus—they stand alongside one another as evenly-weighted, coherent units of expression. For example, the stanza preceding the one quoted above is as follows:

the double-breasted hipster
has been replaced with a
dashiki wearing rip-off
who went to city college
majoring in physical education.

This stanza develops, from line to line, a description of—and stance towards—this “dashiki wearing rip-off” who replaces the “double-breasted hipster.” Each line builds on the last, slowly unfolding different aspects of how one figure “has been replaced” with another: the speaker discloses a skeptical attitude towards these figures, identified by what they wear, where they went to school, and what they studied. Like the stanza with the page break, this group of lines seems to me to develop a coherent line of thought that doesn’t spill over into subsequent stanzas.

Understanding these stanzas in light of the poem as a whole, then, aligns with this reading: the rhythm of the poem as it moves from stanza to stanza seems to emerge from a feeling of moving from one idea to the next—and, for me as a reader, breaking this group of lines at the page break into two different stanzas feels like it disrupts that rhythm.

It could certainly be argued that the group of lines with the page break was meant to be two stanzas specifically so as to disrupt the rhythm of this stanzaic form—that such a disruption is vital to the poem’s meaning. But, as is the case with scholarly editing, I had to make a judgement call to proceed with my project. So I considered everything I knew, tried to find out more if possible, and made the best decision I could given what I had in front of me.


One last example. Lines of poetry can get very long. Sometimes, lines get too long for the physical documents on which they’re inscribed. During an enlightening conversation with Jahan Ramazani on this and many other issues addressed in this post, he gave me the example of editing The Norton Anthology of Modern and Contemporary Poetry and having to print and number the extremely long lines of Allen Ginsberg’s “Howl.” Central to this decision-making process was considering standard practice on what the Chicago Manual of Style calls “Long lines and runovers in poetry.”

The CMS defines runovers as “the remainder of lines too long to appear as a single line,” which are “usually indented one em from the line above.” In other words, when lines get too long—as in Ginsberg’s poetry, or Walt Whitman’s—a hanging indent about an em-dash in length tells the reader that the line was too long for the book. The entry concludes, however, by indicating that it might not always be so clear when an indentation is a runover and when it’s a new line:

Runover lines, although indented, should be distinct from new lines deliberately indented by the poet … Generally, a unique and uniform indent for runovers will be enough to accomplish this.

As we’ve seen already just in this post, much of poetry in my corpus rebels against traditional poetic form, including standard indentation and spacing practices. Determining whether or not a group of words is one or two lines, however, is extremely important for my project. The “line” is the basic unit I’ve been asking sentiment analysis tools in TextBlob and NLTK to evaluate for sentiment. In short: what counts as a line really matters, and ambiguities surrounding runovers could very well add up to have a significant impact on the results of my analyses.

An excellent example of this appears a few pages earlier in Madhubuti’s Don’t Cry, Scream, in a poem titled “Gwendolyn Brooks.” The poem is available online through the Poetry Foundation, and it appears in my physical copy as it does on this website, indentations and all. Halfway through the poem there is a distinct sequence, over a dozen lines long, that lists a series of portmanteaus describing different kinds of “black”—from “360degreesblack” to “blackisbeautifulblack” and “i justdiscoveredblack.” Over the course of this sequence, there are three indented lines, each one-word long, that interrupt the otherwise steady stream of images.

At first bluff, these lines struck me as runovers. The list-like nature of the lines felt like they lent themselves to running a little long—as we see with a poet like Whitman, once a list starts, it can just keep going and going. Moreover, no thematic or poetic reason jumped out at me as to why someone might indent these words as opposed to any others. Of course, there is the possibility that such indentations were completely on purpose, and are part of a project to disrupt and transform any resonance with someone like Whitman and the canon he represents. Sitting in front of my computer, a little bleary-eyed from all the transcribing, I honestly wasn’t sure.

So I began looking for other appearances of the poem. The version published by the Poetry Foundation complicated my initial thought that these one-word indented lines were runovers. Jahan Ramazani also suggested that, given the importance of anthologies to the Black Arts Movement, even if a book has no later editions, individual poems therein might appear somewhere in a collection.

Such a realization, however, presents another fork in the road of my research. As a researcher committed to being as thoughtful and thorough as possible as I work with the poetry from a revolutionary art movement, I am delighted to know that I still might be able to pursue questions that I thought would remain unanswered (i.e., “is this a runover line or two separate lines?”). As a researcher with limited resources, however, I have to decide whether or not pursuing these questions will be the best use of my time and energy in this particular project. There are a lot of anthologies containing poetry from the Black Arts Movement out there, so I have to weigh the time it would take to locate and look through them all for instances of those poems from my ~20 book corpus that may have runover lines, against the potential impact it would have on the results of the analyses I hope to perform. As it currently stands, I’ve made a note of this particular ambiguity and plan to reassess what I should do with it and others like it after assembling the rest of the corpus.

Final Thoughts

As this post has hopefully shown, transcribing texts from book to screen can get very tricky. More than a simple act of mechanical reproduction, it can stump us with questions about literary works that seem to have no discernible answers. From one moment to the next, it can demand a working knowledge of bibliographical methods; digital methods; aesthetic form; and how to manage a project’s resources. And—as Drucker above argues regarding text analysis more generally—navigating these questions requires rigorous human judgement every step of the way. Even in situations where the practicalities of project management and the realities of our textual archive make this judgement feel all-too-fallible.

There are other, important aspects of this human judgement which I haven’t had time to think through as much as I would like to have in this post. For example, digging deeper into those questions explored by Andrew Pilsch mentioned above that investigate the challenging ways in which web browsers are designed to parse the whitespace in poetry in HTML. Or, how the default parameters of the basic tokenizing packages in NLTK throw away whitespace—the idea that the programmers behind these text analysis technologies view their standard use as most likely to focus on text, not the spaces between text.

Very long story short: transcription is complicated! And I hope this post has done something to foreground some of those invisible, behind-the-scenes decisions that—like modeling and parameterization—give shape to the results a text analysis project produces.

All About the Archive: Guest Teaching at Washington and Lee

Wed, 07/02/2018 - 17:51

In this post Lauren Reynolds, a former PhD student in Spanish and Makerspace Technologist, describes her work with Professor Andrea LePage’s course at Washington & Lee. This work is supported by an ASC grant expanding collaboration between Washington & Lee and the Scholars’ Lab and supplemented by W&L’s Mellon-funded grant to support digital humanities in the classroom. Read more about the collaboration. Cross-posted to the WLUDH blog.

I was invited to guest lecture for Professor Andrea LePage’s undergraduate course, Contemporary Latinx and Chicanx Art. After discussing possible topics for the workshop, Professor LePage and I decided on the topic of “Archive as Protest.” This topic overlapped with my research on cultural memory in US Latinx texts and presented me with the opportunity to learn more about digital archives. As I developed the plan for the workshop, I organized the lesson around questions surrounding digital archives, preserving cultural memory, and cataloguing a variety of experiences.

These are very broad questions, so I outlined two goals for the class: First, I wanted the students to begin to think about information storage in the broadest sense. Then we would narrow down the idea of seemingly endless information to a conversation about cataloguing and metadata. Second, I aimed for our discussion of cultural creation and preservation to help the students understand one way in which preserving information through archives can have a positive social impact.

After introductions, we began the lecture with a brief discussion of Jorge Luis Borges’ short story La biblioteca de babel. This text gave me the opportunity to sneak a bit of Latin American literature into the course and provided an entry point for talking about information storage. So, we began with questions about Borges’ conception of an infinite library: Why do you think some people say that Borges “discovered” the internet decades before it was invented? What similarities do you see between the infinite library and the internet? What are some differences? How is a library organized? Is the internet organized? What possibilities/challenges do a universe of information pose?

Next, we zoomed in to a more focused discussion of archives, their purposes, and how the internet has changed the preservation and accessibility of information. We talked about documenting history from many perspectives and, in small groups, the students reflected on the following quote from Daniel Mutibwa:

“The overarching argument is that local, alternative, bottom-up approaches to telling (hi)stories and re-enacting the past not only effectively take on a socio-political dimension directed at challenging dominant, hegemonic, institutional narratives and versions of the past, but – in doing so – they also offer new and refreshingly different ways of understanding, representing, remembering, and rediscovering the past meaningfully in ways that local communities and regions can relate with.”[1]

The students began to connect this quote to their own interests as we discussed the possibilities of digital archives. We specifically looked at the Hurricane Katrina collection to talk about the pros and cons of bottom-up archives:

We noted how such archives allow for individual stories to be shared and how they can become part of a community’s healing process after a tragedy.

This digital archive also prompted interest in logistical questions, such how stories are collected, saved, and mapped in the creation of an online archive. Specifically, the students were asked to think about:

  1. Development: How to choose what to include, authenticity
  2. Retrieval and Collection
  3. Reaching the Community: Supporting Research, Learning, and Teaching
  4. Reference Information and Providing Access 

Our last activity presented the opportunity to learn about different types of metadata and its role in cataloguing. We discussed social media presences as types of personal, living archives and how hashtags such as #TBT, #breakfast, and #gooddog can be seen as a means of organizing Instagram posts. In pairs, the students were then given three photos of different US Latinx artworks and asked to assign categories to each photo. They thought about specificity and accessibility: how to make the photos both accessible in broad searches, but easily found for specific inquiries. Each pair shared their selected words with a another group. After comparing their different hashtags and debating which labels were the most useful, each group came up with a definitive set of categories. We then talked about the different “data sets” created in class, noting the benefits and possible drawbacks of each one.

The class concluded with small group discussions of overarching questions:

  1. Difficulties posed by the fact that technology is always changing
  2. How to establish trust between archive curators and communities
  3. Library neutrality, the library’s role in community engagement, and the line between memorial and protest
  4. Advantages and disadvantages of allowing anonymous submissions
  5. Oral Histories: Who determines what questions are asked? How are these interviews and all texts edited and by who? Can “alternative” truths be abused to represent dangerous falsehoods?
  6. How do we preserve horrific histories? Do we reproduce offensive terms?

With the time remaining, the students talked about whichever question interested them most in their work and, more broadly, in their lives.





[1] Mutibwa, Daniel H. “Memory, Storytelling and the Digital Archive: Revitalizing Community and Regional Identities In the Virtual Age.” International Journal of Media & Cultural Politics, vol. 12, no. 1, 2016, pp. 7-26.

Spring 2018 UVa Library GIS Workshop Series

Fri, 12/01/2018 - 14:52

All sessions are one hour and assume participants have no previous experience using GIS.  Sessions will be hands-on with step-by-step tutorials and expert assistance.  All sessions will be held on Tuesdays from 10AM to 11AM in the Alderman Electronic Classroom, ALD 421 (adjacent to the Scholars’ Lab) and are free and open to the UVa and larger Charlottesville community.  No registration, just show up!

February 6th
Making Your First Map with ArcGIS
Here’s your chance to get started with geographic information systems software in a friendly, jargon-free environment.  This workshop introduces the skills you need to make your own maps.  Along the way you’ll get a taste of Earth’s most popular GIS software (ArcGIS) and a gentle introduction to cartography. You’ll leave with your own cartographic masterpieces and tips for learning more in your pursuit of mappiness at UVa. 

February 13th
ArcGIS Online: Introduction
With ArcGIS Online, you can use and create maps and scenes, access ready-to-use maps, layers and analytics, publish data as web layers, collaborate and share, access maps from any device, make maps with your Microsoft Excel data, and customize the ArcGIS Online website.

February 20th
ArcGIS Online: Spatial Analysis
ArcGIS Online now has spatial analysis tools that can be easier to use than similar desktop GIS tools.  Come learn how to use the simple yet powerful analysis tools available through ArcGIS Online.

February 27th
ArcGIS Online: Story Maps
Story Maps are templates that allow authors to give context to their ArcGIS Online maps.  Whether telling a story, giving a tour or comparing historic maps, Esri Story Maps are easy-to-use applications that create polished presentations.

March 13th
ArcGIS Online: Data Collection
Whether you are crowd sourcing spatial data or performing survey work, having applications that automatically record location and upload data directly to a mapping application is incredibly useful. 

March 20th
What’s New with ArcGIS Pro
The handwriting is on the wall.  ArcGIS Pro will be replacing ArcMap as the desktop GIS in the near future.  Come learn about the changes and quirks of ArcGIS Pro from an ArcMap user prospective.

March 27th
Introduction to QGIS
ArcGIS isn’t the only game in town.  The best and most popular open source GIS application is QGIS.  It runs on most platforms and does some things better than ArcGIS.  Come learn more about another tool in the GIS toolbox.

Fellowship Calls and Grad Student Professional Development

Thu, 14/12/2017 - 14:42

I want to share several developments from the grad programs side of the Lab this semester. It’s been a busy fall, and I’m pleased with all the work the team has put into our programs!

For one, the CFPs for two of our fellowship programs are now live. The Praxis Program, which will welcome its eighth cohort next year, will have a deadline of February 15th for applications from PhD students at UVA. This flagship program is in many ways the core of our graduate community, and we’re very excited that it continues to thrive. I am also very pleased to announce that the Digital Humanities Prototyping fellowships, piloted this past year with a cohort of four students, will continue next year with its own application deadline of February 15th. Open to PhD *and* MA students at UVA, these fellowships are meant to shore up our support of students in the intermediate years of their graduate work, to provide collaborative projects a space in our fellowship portfolio, and to give young scholars a chance to craft a spark that might catch further down the line with applications for further funding here or elsewhere. Please tell your students and colleagues! I always strongly encourage students to get in touch with me if they are planning to apply – that way they will be on our radar for other opportunities down the line regardless of how this particular application shakes out. Along with our newly restructured DH Fellows program, these three fellowship programs provide support and experience for more stages of the graduate student timeline than was previously possible.

In addition to the fellowship announcements, I also wanted to draw attention to a revamping of what was formerly known as the “graduate fellowships” page. Our programs have grown a lot since this page was last revised, and the new “graduate fellowships and opportunities” page now better represents the wealth of offerings in the Scholars’ Lab. This new, catch-all page offers a space where students can see all of our opportunities beyond our annual fellowship programs. We regularly employ graduate students as Makerspace Technologists to assist in 3D printing and experimental computing in our makerspace (and we just released a call with multiple openings for spring 2018!). Cultural Heritage Informatics Interns each semester work with Will and Arin to 3D scan, process, and print artifacts all while getting course credit. Chris and Drew regularly work with student GIS Technicians who assist in the uploading of GIS datasets and creating applications on our GIS portal, all while getting valuable experience in spatial humanities. And, finally, a Mellon-funded collaboration with Washington and Lee University allows us to send students to their campus to give workshops on digital humanities to undergraduate courses. The amount of experience required for all these opportunities is quite variable, so be sure to read closely – in many cases we are more than happy to have you learn on the job. We’ve been doing all these things for quite a while, but hopefully now students can find easier access to information about our programs and how to get involved.

Finally, I’m especially pleased to share that we have a new section in this page on professional development for graduate students. The Scholars’ Lab programs give students valuable experiences and training, but we’ve also historically gone further than these official offerings. As UVA students apply to alt-ac and DH careers, we regularly give advice on the whole process, from finding a job to producing materials to interviewing. These offerings have long been ad hoc and by request, but I worried over the last several months that some potential students might get left out of such arrangements. A student might not know, for example, that we’d be willing to mock interview them in the happy event that they’re invited to campus for that digital humanities developer position. Or a student putting together their first job talk for a post-doc in digital humanities might not realize that we’re happy to lend a friendly ear and also share our *own* job talks.

This section is not perfect, and it by no means represents the sum of what any program can do to support graduate students. If you see something missing, drop me a line to let me know. But hopefully the statement of services there will serve as nice counterpoint to the values that we lay out in our group charter; hopefully the page’s presence will help someone find their way to us who might not otherwise have done so. After all, tacit assumptions about how others perceive our services can lead to people falling through the cracks, feeling like they’re going through a job search alone. Best that we be explicit, and best that we match our values with public statements of what we will do to back them up.

So in short – we’re here for you. If you’re part of the UVA community and looking for help with your DH or alt-ac job search, swing on by and let me know how we can help!

Call for Spring 2018 Makerspace Technologist Applications

Wed, 13/12/2017 - 22:29

Are you a UVA graduate student or upper-level undergraduate in the humanities? Come join our team as a Makerspace technologist!

Our Makerspace is designed to foster experimentation with 3D printing, modeling, and digitization, physical computing (e.g. Arduino, wearables), virtual reality, and more. For humanists, it is a good way to learn more about experimental and digital humanities by exploring new uses for digital technologies in fields that do not traditionally integrate them. No prior experience with electronics or 3D printing is needed. Successful candidates will be trained on these tools and will in turn pass on their training to disciplinarily diverse students, faculty, and staff interested in using them for fun, teaching, and research. We also strongly encourage technologists to work on their own personal projects and to develop expertise based on their own scholarly interests.

An important aspect of Maker culture is apprenticeship and supporting makers in their pursuit of professional experience. We are looking for motivated individuals who are capable of working independently and value the opportunity to engage with and support a growing community. Benefits of the job may include: access to expertise and mentoring in your field of interest, opportunities for collaboration and publication, use of equipment and tools, and ability to shape Scholars’ Lab workshops and programming.

Candidates should be able to work up to 10 hours per week. Applications should consist of a cover letter discussing their interest in working in the Scholars’ Lab, any experience or interest in participating in a maker space, and any previous experience with public service or assisting others in using technology. Please send inquiries and applications to

Multiple openings are available for Spring semester, and review of applicants is ongoing until filled.

First Steps with NLP and a Collection of Amiri Baraka’s Poetry

Thu, 30/11/2017 - 20:27

Amiri Baraka’s Black Magic, 1969

In this post I’ll discuss my initial foray into natural language processing (NLP)—cleaning up a corpus and prepping it for some basic text analysis techniques. I want to begin, however, with a note on the small textual corpus that I’m using in these preliminary explorations—Black Magic, a 1969 collection of three books of poetry by Amiri Baraka.

In a prefatory note to the collection, Baraka offers an “Explanation of the Work” that touches on the three books of poetry contained within. “Sabotage,” he writes of the first book, “meant I had come to see the superstructure of filth Americans call their way of life, and wanted to see it fall. To sabotage it,” in a word. The second book, he argues, takes this intensity even further: “But Target Study is trying to really study, like bomber crews do the soon to be destroyed cities. Less passive now, less uselessly ‘literary.’” If these comments are any indication, the poetry of Black Magic has a certain level of emotional and political intensity. These poems articulate rage—they thunder, fulminate, and protest, venting a vindicated anger at racial injustice in America. Others simmer with a more restrained heat, but still tend to employ an often unsettling rhetorical violence. Consider, for example, the conclusion of a poem from Sabotage titled “A POEM SOME PEOPLE WILL HAVE TO UNDERSTAND”:

We have awaited the coming of a natural
phenomenon. Mystics and romantics, knowledgeable
of the land.

But none has come.
but none has come.

Will the machinegunners please step forward?

Though startling, this final image punctuates a familiar narrative: the mounting of frustration, the boiling over of feeling while waiting and waiting for justice. The speaker’s closing remark seems to respond to the question asked in Langston Hughes’s poem “Harlem”—”What happens to a dream deferred?”—but raises the ante of the inquiry, and shifts from Hughes’s suggestive but still open-ended conclusion (“Or does it explode?”) to an unsettling direct request (“Will the machinegunners please step forward?”). The poem also, however, seems aware of its high dramatic tone: it conveys the gravity of this deferred deliverance with somewhat formal rhetoric like “We have awaited” and “But none has come”, but highlights—and perhaps undercuts—its own theatricality by embedding a stage direction in the poem, “(repeat)”. We’ve waited for long enough, the poem seems to argue, but stages this claim in such a way that the final line’s delivery hangs suspended somewhere between deadpan and dead serious.

In short: a heightened revolutionary rhetoric permeates the poems in this collection. Many have noted, however, that a troubling violence permeates them as well. For example, one scholar describes “Black Art”—one of the most graphic but also most well-known poems from this collection—as “a difficult poem in its race and gender violence, in its violence against peoples.” In the 1991 The Leroi Jones/Amiri Baraka Reader, editor William J. Harris describes Black Magic as a collection in which Baraka “traces his painful exit from the white world and his entry into blackness,” an “exorcism of white consciousness and values [that] included a ten-year period of professed hatred of whites, and most especially jews [sic].” Baraka looks back at this period in his 1984 autobiography at a remove from the red-hot intensity of the poems themselves: “I guess, during this period, I got the reputation for being a snarling, white-hating madman. There was some truth to it, because I was struggling to be born, to break out from the shell I could instinctively sense surrounded my own dash for freedom.” From this perspective, this is the violence of escape, of “struggling to be born” from within a constricting “shell”—a version, perhaps, of the violence of the deferred dream that explodes at the end of Langston Hughes’s poem “Harlem.”

Initial Steps with NLP

As a scholar interested in articulations of anger, resentment, and frustration with injustice—particularly injustice of a systemic and institutional nature—as well as digital methodologies, I thought these texts in particular might be worth looking at more closely with NLP techniques.

As a graduate student working in a period that is almost entirely still in copyright, however, Black Magic also interested me because it is a small corpus of works—three books of poetry—to which I currently have access through UVA. Though conceptually unglamorous, basic questions of access have played an enormous role in determining the initial paths in my scholarly decision-making process.

In this sense, though assembling workable data is always a challenge, scholars interested in literary texts prior to the early 20th century have more options for readily accessible textual corpora. For 20th- and 21st-century scholars interested in textual analysis, however, questions of copyright have made finding openly available textual data from which a corpus could be built an extremely difficult task: while able to share results of analyses through transformative, non-consumptive use, scholars of these periods cannot share the corpora from which these insights are drawn. This presents additional challenges in terms of reproducibility as well as in the already long, labor-intensive task of assembling, cleaning, and prepping a corpus prior to any actual application of NLP techniques. If texts aren’t already available as text files through a university or institution, they either have to be typed out by hand or scanned page by page, run through optical recognition software that transforms the page image into text, then also ultimately cleaned and corrected by hand. In short: no preexisting corpora means no experiments, prototypes, or conceptual ventures without surmounting certain barriers to entry that often prove time- or cost-prohibitive.

In the case of this project, even though UVA has access to the 1969 edition of Black Magic: Collected Poetry 1961-1967, the text isn’t ready for NLP out-of-the-box. The page contained a lot of text beyond that of the literary work in question: page numbers, line numbers, bibliographical information, headers and footers, all kinds of weird punctuation, and so on. For example, the title of the first poem in Sabotage, “Three Modes of History and Culture,” appeared in this electronic edition as follows:

Baraka, Imamu Amiri, 1934- : Three Modes of History and Culture [from Black Magic: Collected Poetry 1961-1967 (1969) , The Bobbs-Merrill Company ]

To perform sentiment analysis on Sabotage, then, I first needed to get the raw text. By “raw text” I mean a big bag of all of Sabotage’s words. My goal initially was to get this bag of words with no line numbers, no punctuation, no capitalized first letters (otherwise Python would think they were two different words), and no spaces.

As someone doing this work for the first time, I felt like I could handle writing a program that would remove capital letters, get the txt file into the correct file-type, maybe even get rid of the line numbers. But what about all this clutter surrounding the title of each poem? I considered how I might remove this with a program, but even something as small as irregular line breaks means the words would be chopped up in slightly different ways each time. Given the size of the corpus, I decided it would be wiser to remove the clutter by hand than to write a one-time program that automated it.

With a huge assist from Brandon Walsh, cleaning up the rest of the text with the Natural Language Toolkit (NLTK) was relatively straightforward. We wrote a small Python script that removed line numbers, then proceeded to write a script that would prep the clutter-free text files for text analysis, first by reading the text file as a list of lines (1), then by tokenizing that list of lines into a list of lists, where each sub-list is a list of the words that make up a line (2).

While this may seem kind of complicated, certain kinds of text analysis need the lines to be tokenized in this way—much of the work then involves getting the text to be the right kind of data type (list of words, list of lists, etc.) for a given kind of analysis. Because I’m interested in sentiment analysis, I also needed to make every word lowercase (3), remove punctuation (4), and remove spaces (5).

Having written out all these functions, we then made a new function that called on each of them one after the other, running through the pipeline of activities necessary for NLP (our notes-to-self included):

Though it gets the job done, this code is clunky. It represents, in short, the first steps in my learning how NLP works. And while not the most elegant in terms of form or function, writing steps out in this way was conceptually clear to me as someone trying them for the first time. I also want to add that throughout much of this Brandon and I were practicing something called pair programming, with Brandon at the keyboard (or “driving”) and me observing, asking questions, and discussing different ways of doing things. In addition to being an exciting scholarly investigation, this project is also a learning experience for me, and our code-decision-making process often reflects that.

But more on the intricacies of collaboration later. To recap, at this point I had a series of functions that, in a linear, step-by-step fashion, took my original text file and began to play with them in Python’s working memory: it took Amiri Baraka’s poetry as one data type (a giant string of words) and turned it into another (a tokenized list of lists), with some changes along the way (like lowercasing and getting rid of punctuation).

What made this so clunky, however, stemmed in large part from how I had organized my tasks: I gave Python basically only one thing to think about and work with at a time. It would take my corpus, W, and turn it into X, which it would then turn into Y, and then Z, and so on. But if I wanted Python to remember X while it was working on Z, I had to write code to turn Z back into X—in short, a data-type nightmare. Which sounds pretty abstract, but presented all kinds of practical problems.

For example, after having gotten all the way to Z—my lowercased, punctuation-free list of lists—I wanted to try a basic form of text analysis I had seen in an early chapter of the NLTK book (called stylistics) in which I compared the use of different modal verbs in the three books of Baraka’s poetry. The only way I knew how to do this was to run a frequency distribution on a giant list of words—which means I had to un-tokenize my nicely tokenized texts, basically jumping from Z back to W. So I wrote some clunky code that let me do so:

Grappling with this problem, Brandon re-introduced me to something I had learned about before but never had to use—object-oriented programming. Rather than performing a linear series of functions on my text file, reorganizing my code along OOP lines let me treat this text file as an object with many attributes, any of which I could access at any time. If I wanted my file (or object) as a giant list of words to perform a frequency distribution, I needed only to call upon that particular aspect (or attribute) of my object. If I then wanted Python to think of it as a tokenized list of lists I could just call on that particular attribute rather than having to send it through a series of transformations. It’s as if my ability to manipulate a file gained a third dimension— instead of begin stuck going from X to Y to Z and then back to X, I had access to all three stages of my file simultaneously. In essence, what was once a one-way data-type conveyor belt now became a fully-staffed NLP laboratory. In another pair programming session, we started to shift my more linear code to an object-oriented approach. What we came up with definitely needs refactoring (in my TODO list) and can certainly be improved (i.e., not overwriting a variable multiple times), but again, in the spirit of showing my learning process, I wanted to share a visual of this early version that marked my beginning to grapple with OOP for the first time:

Finally “getting” object-oriented programming conceptually was truly a programming awakening for me, even if my initial attempts need some improvement—it hadn’t really made sense as an approach until I was faced with the problems it helps address.

So we have the poems in all their fiery intensity, as well as the beginnings of actually using sentiment analysis as another way of thinking through them. As it currently stands, Brandon and I have started using TextBlob to perform some basic tasks—more on that soon. If you have any questions or want to follow along, my GitHub project repository can be found here.

“All of the Questions:” A Recap of the 2017 Bucknell University Digital Scholarship Pre-Conference

Thu, 30/11/2017 - 15:31

In early October I was sent to represent the Scholars’ Lab at the Bucknell University Digital Scholarship Conference and the pre-conference meeting. This conference brings together an interdisciplinary group of students, teachers, scholars, librarians, and instructional technologists for a weekend of conversation about many aspects of digital scholarship including pedagogy, community outreach/social justice, and institutional best practices. This year’s conference was called “Looking Forward, Looking Back: The Evolution of Digital Scholarship” and featured keynotes by Stephen Cartwright, Kalev H. Leetaru, and UVA’s on A.D. Carson.

Pre-conference plan:
How do we engage students in digital scholarship and support instructors as they incorporate DH or DS practices in their traditional classes? The BUDSC pre-conference was initially convened around these concerns and charged with the task of developing a “DS Cookbook” featuring ideas, best practices, and resources for instructors looking to include digital projects within their courses. We were initially asked to reflect on questions about our own experiences: What would have been helpful to know the first time we attempted to use digital scholarship in the classroom? How can we engage students in digital scholarship with limited budget, resources, or support?


• Lee Skallerup Bessette, University of Mary Washington
• Joshua Finnell, Colgate University
• Sarah Hartman-Caverly, Delaware County Community College
• Aaron Mauro, Penn State, Erie
• Megan Mitchell, Oberlin College
• Courtney Paddick, Bucknell University
• Carrie Pirmann, Bucknell University
• David Pettegrew, Messiah College
• Kelli Shermeyer, University of Virginia
• Emily Sherwood, Bucknell University

What actually happened…
After a fortifying breakfast of coffee and donuts, our pre-conference group proceeded to make a list of all of the questions and concerns we were stewing over in our work as scholars, teachers, librarians, and instructional technology specialists. This white board was the result:

Some of these issues had to do with the intended purpose of the pre-conference – creating a guide for those interested in engaging students with digital scholarship (early concerns included: how do we scaffold or assess digital projects? What does it mean when administrators want students to have “digital literary” or “digital fluency?”) But it became immediately apparent that the interests of this group had a much wider scope.

Our morning session consisted of sorting all of the issues raised on this initial whiteboard into categories that we could work with more easily, as well as discussing and sharing resources that we all had at hand. In our afternoon session, we broke up into small groups to work on articulating major questions, a list of best practices, and a set of helpful resources for approaching these topics in a variety of contexts.

The results of our work were presented at the pre-conference recap session of BUDSC (which we re-titled “All of the Questions”) and will be published online forthcoming, but for now, here are some highlights:

Communicating with Stakeholders: This group provided strategies for talking with administrators and other stakeholders about the value of collaborative digital scholarship, how to find funding for cross-disciplinary work, and how to communicate about DH work as part of promotion and tenure. They suggested that A Short Guide to the Digital_Humanities can be used as a helpful introduction to digital scholarship for administrators and faculty who are unsure of what they may be getting themselves into. MLA also has some guidelines for evaluating digital scholarship for P&T purposes.

Data Security & Privacy: This group explored a whole set of questions that I, frankly, had never thought about in any great depth. They asked us to consider, “What exactly is data, anyway? What do we consider to be data in the context of digital scholarship? As we delve more into the world of digital scholarship, it’s become evident that so much of what we do is based on some form of data – be that numerical data, textual data, geospatial data, audiovisual data, etc. With that in mind, how do you ensure ethical, responsible creation and maintenance/preservation of datasets?” The Data Curation Centre can supply researchers with expert help on this topic. This group also suggested Purdue’s Digital Retention Policy as a model document for schools or departments wishing to develop their own protocols regarding data.

Digital Pedagogy: Our group assembled a slew of resources for teachers wanting to engage with digital projects in their classrooms. We asked: “What are we assessing when we ask our students to complete digital assignments and how do their outcomes interface with the goals of traditional scholarship? How do we encourage them to value the process over the product?” The resource list includes many sample assignments and assessment ideas, as well as a collection of what we called “easy wins” – plug and play tools to work with in the classroom, including:
Timeline js – make a simple, multimedia timeline
Voyant – beginning large text analysis
Prism – annotate your text
Twine – create interactive fiction
Juxta Commons – Compare texts
IMJ – Large image visualization

IP/OA/Fair Use: This group explored how to approach fair use and copyright as our students use, remix, and edit online content for their own projects. We can begin by assessing our own/our institution’s tolerance for risk. Very important take-away point: No one is carting you off to jail for remixing something – the worst that will happen is a take-down notice. There’s also an increasing amount of legal precedence for going a little cowboy with fair use, as demonstrated by this video which not even Disney was able to successfully remove: A Fair(y) Use Tale

Sustainability: The questions of project management, project charters, sunsetting, hosting, institutional repositories, and archiving looked like a separate category for us at first, but discussions of these issues were interwoven throughout the other four categories, rather naturally. Shout outs here went to Reclaim Hosting and Miriam Posner’s blog post on Project Charters.



N.B. quotations are from the co-authored pre-conference documents.

My Experience Leading a Workshop on Text Analysis at Washington and Lee University

Thu, 30/11/2017 - 15:15

[Sarah went to Washington and Lee University to give a workshop in Prof. Mackenzie Brooks’s DH 102: Data in the Humanities course through a Mellon-funded collaboration with WLUDH. More information about this initiative can be found here, and this piece is crossposted to the WLUDH blog.]

As a graduate student participating in the University of Virginia and Washington & Lee University digital humanities collaboration, during the fall 2017 I led a guest workshop on text analysis in Mackenzie Brooks’ course DH 102: Data in the Humanities.  This workshop was an exploration of approaches to text analysis in the digital humanities, which concurrently introduced students to basic programming concepts.  For humanities students and scholars, the question of how to begin to conduct text analysis can be tricky because platforms do exist that allow one to perform basic text analyses without any programming knowledge.  However, the ability to write one’s own scripts for text analysis purposes allows for the fine-tuning and tailoring of one’s work in highly-individualized ways that goes beyond the capabilities of popular tools like Voyant. Additionally, the existence of a multitude of Python libraries allows for numerous approaches for understanding the subtleties of a given text of a corpus of them.  As the possibilities and directions for text analysis that Python enables are countless, the goal of this workshop was to introduce students to basic programming concepts in Python through the completion of simple text analysis tasks.

At the start of workshop, we discussed how humanities scholars have used text analysis techniques to create some groundbreaking research, such as Matthew Jockers’ research into the language of bestselling novels, as well as the different ways that text analysis can be approached, briefly looking the online text analysis tool, Voyant.

For this workshop students downloaded Python3 and used the simple text editor that is automatically installed with it, IDLE.  This way we didn’t have to spend time downloading multiple programs.  While IDLE is rather barebones, its functionality as a text editor is fine for learning the basics of Python, especially if one doesn’t want to install other software.  From here, by using a script provided to the students, we explored the concepts of variables, lists, functions, loops, and conditional statements, and their syntax in Python.  Using these concepts, we were able to track the frequency of chosen words throughout different sections of a story read by the script.

The workshop then delved into a discussion of libraries and how work can be enhanced and made to better suit one’s needs by using specific Python libraries.  As the focus of the workshop was on text analysis, the Python library that we looked at was NLTK (Natural Language Toolkit), which has a vast variety of functions that aid in natural language processing work, such as word_tokenize() and sent_tokenize(), which break up a text into individual parts, as words or sentences, respectively.  The NLTK function FreqDist() simplifies the task of getting a count of all the individual words in a text, which we had done with Python alone in the prior script before working with NLTK.  The inclusion of NLTK in the workshop was meant to briefly show students how important and useful libraries can be when working with Python.

While only so much can be covered over the course of a single workshop, the premise of the workshop was to show students that you can do some very interesting things with text analysis with basic Python knowledge, and to dive into Python programming headfirst while learning about general concepts fundamental to programming.  As digital humanities methods for humanities research are becoming more and more common, working with Python’s capability for natural language processing is a useful tool for humanists, and in an introductory class, the goal of my workshop was to spark students’ interest and curiosity and provide a stepping stone for learning more, and at the end of the workshop, further resources for students to turn to in learning more about Python and text analysis were discussed.

Learning to Augment Reality

Tue, 21/11/2017 - 19:35

The Praxis team is in the midst of defining its project, and for the past few weeks, we’ve been playing around with augmented reality (AR), specifically by using Vuforia and Unity. Learning about AR has been fascinating and, admittedly, a bit frustrating. I won’t go through the process of getting Vuforia and Unity to work with one another (here’s a great intro video if you’re interested!), but I will briefly discuss some of the challenges and implications of trying to augment reality.

First, the target image. The target image is the image that you augment, such that when you point your phone/camera at said image, the 3D figure that you have virtually “added” to the image appears on your screen. But the target image can be tricky. That is, Vuforia scans the target image for certain key features, by means of which the program can identify when your phone/camera is pointed at the target image. I’ve taken some screen shots of a few of the items that I augmented, which Vuforia ranks in terms of “augmentability.”

Images 1, 2, & 3: The Scholars’ Lab sign received an augmentable rating of one star, meaning its identifiable features are minimal. The cover of Vi Khi Nao’s book, Fish in Exile, has four stars, and the “cowboy” lunchbox residing in the Scholars’ Lab received an augmentable rating of five stars. The yellow crosses indicate the identifying features and patterns that Vuforia recognizes.

Not only does the target image need to have enough unique features to be easily identifiable, but the image should be properly edited so that nothing appears in the background. When the image is uploaded with a background, Vuforia will assume that the background is part of the target image, and it will identify features of the background as part of the patterns it is to look for. This will make it difficult if not impossible for your camera/device to recognize the image unless it appears with the exact same background.

Image 4: Cover of Fish in Exile against a mesh chair. The yellow crosses have primarily identified features of the chair – rather than the cover of the book – as unique features, and the “augmentability” of the image has declined to two stars.

Another problem that we ran into has to do with subject matter. We’re currently experimenting with items on or around UVA’s grounds. So we’ve been taking photos of items from the Small Special Collections, buildings, memorials, and even lunchboxes sitting around in office spaces. But this becomes problematic when the photos we take are affected by the environment. For instance, I tried taking a photo of the segment of the Berlin Wall that stands on UVA’s grounds, and here’s how it turned out:

Image 5: A photo of the Berlin Wall at UVA.

Encased in glass, the Berlin Wall is nearly effaced by the reflection of Small Library opposite it. Even, then, if I use a “clean” shot of the Berlin Wall taken from the Internet as my target image, my augmentation of the image will not be identifiable or reproducible if someone were to point their camera/phone at the actual Wall on grounds.

So needless to say, our work with AR is still very much in progress. But as we continue developing our AR ventures, considerations of target image complexity and environmental factors will, it seems, help shape the scope of our project.

And on this parting note, I’d like to include a couple fun pictures of the fruits of our augmentation experiments thus far. Enjoy!

Images 6-9: Augmentations of Fish in Exile and the Cowboy lunchbox.

Measured Unrest in the Poetry of the Black Arts Movement

Wed, 15/11/2017 - 20:07

As one of the graduate fellows at the Scholars’ Lab this year, I am working on a year-long digital project (that’s also a chapter of my dissertation) in collaboration with the folks at the SLab. To sum it up in a sentence, the project hopes to offer a proof-of-concept for performing sentiment analysis on some of the most politically and affectively charged poetry of the 20th century, that of the Black Arts Movement of the 1960s and 70s. Today I wanted to post a brief overview and introduction to what I’m working on.

For some context, my research investigates theories of affect as they relate to race, class, and gender in American literature. I focus in particular upon the provocation and articulation of emotions like frustration, anger, and discontentment within recent US literary history as they relate to systemic injustice. An agitprop play that ends with shouts for workers to unite in class revolution; a poetic broadside that vents frustrations against white supremacy in America; a novel that indulges in a revenge fantasy against America’s colonial history. Unlike plays, poems, or novels that seem to obscure, submerge, or confound their own political dimensions, these works wear their hearts on their sleeves: they are frustrated, pissed off with how things are, and unafraid to speak truth to power in a direct, seemingly “un-literary” way.

At a certain level, then, this is a question of how, where, and to what ends aesthetics and politics meet in a work of literature. To offer a tidy narrative of this prickly history, this sensibility that mobilizes aesthetic objects to address political injustice has posed all kinds of unexpected, even contradictory problems for literary study. On the one hand, the cool detachment of aesthetic mediation keeps experimental works like John Dos Passos’s Communist-leaning U.S.A. trilogy from being seen as mere propaganda, but runs the risk of appearing elitist or self-indulgent. On the other hand, the red-hot political outrage of a protest poem by Amiri Baraka or Sonia Sanchez grounds itself in the present, but may be attacked for subordinating aesthetic sophistication to political agendas. “Anger is loaded with information and energy,” says Audre Lorde in a 1981 speech on its political uses—but the nature of this affective information, sparked by a given political present, becomes highly vexed when articulated by different groups through aesthetic objects.

Building on recent scholarship (like the work of Lauren Berlant and Sianne Ngai) suggesting that feeling gives structure to cultural formations, I argue that a history of unrest in America reveals a pattern of artistic response, a sensibility, precipitated by specific historical moments but translated into aesthetic practice through a stable constellation of affective structures. To this end, I examine continuities between politically-engaged aesthetic projects from three periods of discontent in American history: radical journals like Partisan Review in the 1930s; the revolutionary poetry of the Black Arts Movement in the 60s; and contemporary revenge-driven novels drawing from the Red Power movement.

My digital project as a graduate fellow is the second of those three chapters. In it I hope to ask two questions in particular: first, how are the feelings associated with injustice in the 1960s and 1970s coded in terms of race and gender? The Black Arts Movement first took shape at the height of the Black Power Movement with the foundation of the Revolutionary Theatre by Amiri Baraka in 1965. As Larry Neal—one of its principal theorists—says in a 1969 manifesto, the “Black Arts movement seeks to link, in a highly conscious manner, art and politics” toward “the liberation of Black people.” Moreover, the movement’s “black esthetic” is famous for its affective dimensions, often exploring the limits and political uses of anger, frustration, and poetic rage. But while BAM writers sought to link art and politics through explicitly racial terms, many—though by no means all—were marked by a failure to attend to the intersections of gender with racial injustice.

This leads to my second question: what can natural language processing techniques like sentiment analysis show us about the relations between different dimensions of poetry—like affect and gender—given that poetry, unlike movie reviews or customer feedback, is highly figurative and notoriously difficult to quantify in terms of sentiment or opinion? How can we combine the powerful scale of sentiment analysis with the granularity of close reading to explore the intersections of feeling, gender, race, and injustice in the radical poetry of this period? Moreover, by employing an interpretive method that is in part suspect from a revolutionary perspective—a distanced, potentially de-contextualized computational analysis—I wonder: what limits might these methods have in reading texts that are themselves shaped by the experience of an intense surveillance culture fearful of radical thought?

The already vibrant conversations on sentiment analysis and NLP more generally have been illuminating in forming my questions. The discussion between Matthew Jockers and Annie Swafford on the Syuzhet package and “archetypal plot shapes” has helped me not only to explore the current possibilities and limitations of sentiment analysis as applied to literary corpora, but also to think through the kinds of results we expect from digital projects and how we verify those results as an academic community. With regards to poetry and NLP more specifically, Lisa Rhody’s topic modeling of highly figurative ekphrastic poetry is a great model for how unexpected failures in textual analysis can also be productive, prompting us towards new questions as well as new understandings of familiar methods like close reading.

So far I have been working in collaboration with folks at the Scholars’ Lab to work through the NLTK handbook, building and prepping my corpus, and beginning to implement some NLP techniques with TextBlob on what I have so far. Another post on those first forays into NLP and sentiment analysis coming soon! In the meantime, if you have any questions about the project, texts or tools I should check out, or just find it interesting and want to talk about it, send me an email! I’ll be posting about my progress over the course of the coming months and aiming to keep my process as open as possible to new ideas, feedback, and inspiration from unexpected places.

3D Printed Enclosures with OpenSCAD

Tue, 14/11/2017 - 17:35

This is a tutorial on how to use OpenSCAD to design a 3D object via code instead of using a WYSIWYG editor like Tinkercad, Fusion360, etc. We are currently creating a customized media player to allow people to interact with MP3 artifacts. We’ve been working in Python to prepare the audio and wanted to generate the enclosure programmatically as well, ideally using open source software. OpenSCAD is a great open source solution for CAD and 3D printing projects.


In OpenSCAD, you can quickly build duplicates of small parts into more complex designs using “modules”. By assigning variables to parameters, you can vary the size and location of these objects easily. Modules also help break a larger job into more manageable parts and keep the code nice and clean. The four modules below construct the main body of the enclosure, arrange the holes in the enclosure for our electronic components, add a texture to the enclosure, and assemble all the pieces together. After calling those four modules, all that is left to do is split the enclosure in two and render the halves as separate STL files for printing.


Main Enclosure Body

/* This module constructs the main body of the enclosure. First, we name the module: */ module enclosure() { /* Next, we call the difference function. This specifies that we will be subtracting the second object we call from the first. We will use this to make our cube hollow. */ difference() { /* The first object will be our main cube. to give the cube rounded edges, we call minkowski, which will trace the shape we specify around the edges. We will use a sphere, so that the hard edges of the cube will take on the shape of the sphere. */ minkowski() { /* Lastly, I am calling difference again here because I wanted to add a small indentation to the bottom of the cube so that it would be more comfortable to hold. Again, difference subtracts the second object from the first, so here, we see a cube; and then an offset (translated), smaller cube(); */ difference() { cube([60,40,15], center=true); translate([-15,-10,-8]) cube([30,20,1.5]); }; /* Having constructed the main box, we can now specify the size of the sphere that we will use to round the edges. */ sphere(2); }; /* Having specified our main enclosure body with rounded edges and an indentation on the bottom, we finally hollow it out. */ cube([61.5,41.5,16], center=true); } }



Making Holes for Electronics Components

The second module creates all of the holes that we will place in the enclosure for our electronics components.

module enclosureHoles() { /* This section of the code constructs all of the independent holes and joins them into a uniform object. */ union() { // Screen translate([-13.75,-11,5.5]) cube([27.5, 19.375, 5]); // LED Backlight translate([-14.6875,10,5.5]) cube([29.375, 8.75, 5]); // Volume Pot translate([0,-15.75,5.5]) rotate([0,0,0]) cylinder(r=1.25, h=5); // Pushbutton #1 translate([21.5,0,5.5]) rotate([0,0,0]) cylinder(r=4.75, h=5); // Pushbutton #2 translate([23.5,-12,5.5]) rotate([0,0,0]) cylinder(r=4.75, h=5); // Pushbutton #3 translate([-21.5,0,5.5]) rotate([0,0,0]) cylinder(r=4.75, h=5); // Pushbutton #4 translate([-23.5,-12,5.5]) rotate([0,0,0]) cylinder(r=4.75, h=5); } }



Adding a surface texture

The next module creates a texture on the surface of our enclosure from an image file. We wanted to use an image of JPEG artifacts for our project, but you could use anything you’d like, or skip this step entirely. Be sure to keep your PNG files very simple here, otherwise you will run into problems when trying to render. When our PNG file was 31kb it took many hours to render and resulted in a huge STL file that was impossible to print. We needed to get our PNG down to 6kb to make it render in a reasonable amount of time. This resulted in a 5mb STL file. Still kind of big, but reasonable. Below, we call the translate() function so that it sits right on the surface of our enclosure.

module texture() { translate([0,0,9]) scale([.41,.36,.006]) surface(file="/Users/YourUsername/Path/To/Your/File/fileName.png", center=true); }



Bringing it all together

The final module assembles the previous three modules together.

module concat() { /* Difference subtracts the second object from the first */ difference() { /* Our first object is the Union of two objects. Here, union attaches the texture to the enclosure. */ union() { texture(); enclosure(); }; /* the semicolon signals that that is a complete object. Now the second object is the one we made from the various holes. */ enclosureHoles(); } }



Rendering and Printing

Now all we have to do is render using concat() and save as an STL!

/* To render the entire design, run: */ concat(); /* To actually print, we’ll need to render it in two separate halves which we will attach later. So, comment out the above concat() command and instead run the below code to render the top only */ difference() { concat(); translate([0,0,-8.5]) cube([65,44,2], center=true); } /* then, comment the above out and run the following code to render the bottom only */ difference() { concat(); translate([0,0,2]) cube([65,44,16], center=true); }


That’s all there is to it! With the two halves rendered, all you have to do is save them as STL Files and then use your favorite 3D printing prep software to print.

If you’d like to learn more about OpenSCAD, here is a link to a great cheat sheet.