Jordan Cox, Emmett Eldred, Alexandra George, Sarah Hodgson, Rebecca Smith
In this, our first post about SDFB from a student’s perspective, we will outline briefly some of the mechanics of our course, specifically who was involved, where and when we met, the major resources at our disposal, and the project-specific roles we took on. This post is meant to be mostly descriptive. Subsequent posts will draw out some lessons from the framework we describe here.
The most important tools we used in our course included a purpose-built web application, Google Docs, Google forms, a conference room with a white board, and the online Oxford Dictionary of National Biography.
Much of our work involved an online database and web application that displays, through a graphic, roughly 6000 persons who lived in Great Britain between the years of 1550 and 1700. The web application was constructed by the PIs of the project in collaboration with Steve Melnikoff (Knalij) and Carnegie Mellon University Information Systems undergraduates Katarina Shaw, Adetunji Olojede, Amiti Uttarwar, Miko Bautista, Leonard Sokol, and advisor, Raja Sooriamurthi. Earlier work with colleagues in Statistics had inferred roughly 19 million relationships by (a.) extracting names of persons from the 62 million words of the DNB (58,000 entries) and (b.) developing models that predicted the co-occurrence of any two names. The web application allows students to see and interact with the fruits of the earlier text mining and inference stage. Each person is represented by a colorful dot, or node, and relationships are represented by connecting the nodes with lines. 19 million inferences can only be so accurate, however. The computer makes potential relationships between people visible, but only experts can verify probabilities in any particular case. Thus, the project PIs needed human researchers, and that was where the five of us came in. Assessing and validating inferred relationships was a main priority of our work.
Though our Research Training Course met weekly for one hour, the majority of the learning and work involved took place outside of the classroom. Our starting point was to use the web application to validate relationships between two people joined in the database.
The first step involved finding who or what interested us (i.e. a group of people, a specific person, or a time period). We explored relationships via the web application and called regularly on The Oxford Dictionary of National Biography (ODNB) for contexts and support. Eventually, each of us would identify a person we wanted to learn more about (say, John Donne), read his or her biographical entry in the DNB and then follow suit for another person (perhaps Lady Anne Clifford) who the application suggested was connected to Donne. From there we had several more steps.
The first was to decide whether Donne and Anne Clifford had a relationship. If this was the case and significant evidence was given in the DNB to validate it, then on the web interface our first task was describe the relationship using a drop down menu of relationship types. The classifications for the relationships included possibilities such as “knew of one another”, “close friends”, “coworkers” and so on.
After classifying the relationship, our next step was to rate the confidence of the likeliness that this relationship existed. The choices ranged from Certain, which receives a 95% confidence estimate, to Possible, receiving a 50% confidence estimate, to Highly unlikely, which receives only a 5% confidence estimate that any relationship existed. Underneath the drop down box for our level of confidence was an open text box, where we added additional background information and context to clarify the relationship, providing evidence to support its existence. Normally these entries were only a few sentences, but if ample information existed, some entries ran up to a paragraph in length. To finish our evaluation of the relationship, we cited our sources and identified ourselves as contributors.
Some cases were more difficult than others. If it seemed likely that some sort of relationship existed between two people, but there was not enough information to classify the type of a relationship from the DNB, then we turned to additional sources. Looking to the sources used by the DNB authors, using our library’s catalog system to find biographies, and also searching through online databases and journals became ways of finding more information about how two people could have a relationship. In this case, one has to be committed to go on a journey with these two people. It is hard to gauge when entering the process which relationships will be easiest to find and which will prove slightly more challenging, but there are definitive relationships that pose problems in terms of tracking information and deciding what type of relationship, if any, existed. These types of relationships, and even those whose information is gathered solely from the DNB, are instances where a degree of uncertainty still remains.
Hardest was researching relationships that, in reality, did not exist. It is much easier to prove something exists than to disprove it, a fact we all found rather abruptly, as we all seemed to avoid these cases until Professor Warren prodded us about it. If the the application was giving us a spurious relationship, we had to dig deep enough to be confident the two had never known each other or that the supposed relationship was otherwise mistaken.
A typical kind of problem concerned famous diarists, such as Francis Meres or Samuel Pepys, and the people about whom they wrote. Many times these proved to be one-sided relationships where the diarist knew of the other person but the person probably did not know the diarist. What made these so difficult is that sometimes many resources had to be explored to confirm that the two people did not know one another. In many cases, the two names just do not exist alongside each other. It is rare that a source would explicitly state that one person has no relationship with another person. Often it was up to us to draw a conclusion. Our assignment was to complete ten of these relationships per week.
Aside from spending a majority of the time outside of the classroom, we held weekly meetings devoted to the discussion of our individual findings. Every Monday at 4:30 pm, we met in Professor Warren’s office to discuss issues, concerns, exciting finds, or ideas that we wanted to discuss collectively.
These discussions gave us an idea of where we stood as a team in the project. Early on, we decided it would be good if we created specific project-related roles for ourselves. Our weekly meetings normally began with role-specific updates from each member. Emmett, who took the role of Chief Research Specialist of Early Modern British History, filled us in on queries he had fielded from group members throughout the week. He described the kinds of questions others in the group were asking him about early modern British history, and drew generalizations of interest to the group. Jordan, our Quality Control leader, monitored an “issues log,” a Google form where we entered any problems we had encountered while researching and using the interface.
Often, many of the issues that came up were problems of disambiguation (when one node seemed to refer to two or more people of the same name). Other “quality control” issues included identifying cases of duplication, when the same person appeared twice under different names (for example Elizabeth I. and Elizabeth I); relationship vocabulary (how to describe, say, the relationship of a patron and the writer s/he supports); confidence (coordinating our evaluations of relationship likelihood); or typos and grammar (which we wanted to flag for a later time when our contributions might be edited). One of us, Rebecca, took charge of Technology and Pedagogy, coordinating with a development team from Information Systems and updating both them and us about ongoing developments to improve researchers’ experience with the web application. Sarah, the Editor-in-Chief, coordinated the composition of the present reflections. Alexandra, dubbed our Chief Documentarian, administered our “Thought Forum,” a Google Doc created for us to document thoughts, feelings, and experiences in real time. Because our class time was limited to a single hour per week, the Thought Forum became a way for us to communicate without all being in the same place at the same time.
However, these meetings were not solely updates from each member and reminders of what we should be doing. The discussion space of the meetings was a time for all members to learn. Our roles only started taking form after a couple weeks, when we figured out exactly what each role entailed. This happened both according to what the person in the role was interested in and what the group demanded of each person. We also used the time in meetings to develop strategies for entering the giant pool of knowledge that is the ODNB. It was here that we really began to reach our goal of deeper learning.
About the Authors
Jordan Cox is a student at Carnegie Mellon University interested in the interactions between technology and writing.
Emmett Eldred (email; @emmetteldred on Twitter) is a sophomore at Carnegie Mellon University, studying Creative Writing, Professional Writing, and Ethics, History, and Public Policy.
Alexandra George (email) is a student in Carnegie Mellon University’s class of 2017, working toward her B.A. in Professional Writing.
Sarah Hodgson is a student at Carnegie Mellon University.
Rebecca Smith (email; @rmsmithcmu on Twitter) reports that her Research Training Course with Six Degrees of Francis Bacon at Carnegie Mellon University helped her identify a passion for working with teams in both the humanities and in technology. She looks forward to pursuing this love with a minor in Human-Computer Interaction, and hopes to find similar interdisciplinary work after graduation.