What’s In A Name? The Many Nodes of King James VI and I

Jessica M. Otis (http://orcid.org/0000-0001-5519-8331)

In honor of the Scottish Independence Referendum coming up this Thursday, I thought it would be an appropriate time to look at the monarch who started England and Scotland down the path to union: King James VI of Scotland, who later also became King James I of England.

One of James’ main goals upon taking the throne of England was the political unification of England and Scotland.  He assumed the style of King of Great Britain and introduced the Union Jack.  But political unification would remain a distant dream in his lifetime and only be achieved during the reign of his great-granddaughter Queen Anne, in 1707.

Intriguingly, the disjunction between England and Scotland manifested in an early version of the SDFB network.  Due to conventions in the historiography, James is a man of many names – King James, James Stewart, James Stuart, James VI, James I, and James VI and I.  When the SDFB team ran Named Entity Recognition programs on the Oxford Dictionary of National Biography, this multiplicity of names led to a striking error: James was assigned to four different nodes.  This, in turn, gave us distinct visualizations of the disjunction between the English and Scottish courts in the historical record.

Here is a network centered on the node of James Stewart, a name which encompasses both James the king as well as a number of his relatives, and which were disambiguated in a later version of the network.  The visualization also includes all the people within one degree of separation – that is, the nodes are directly connected to the James Stewart node by a single edge.  These edges indicate a high probability that James Stewart and these other people’s names will co-occur in biographical entries.

You can already see the other three nodes for James: James VI, James I, and King James.  Aside from a few yellow-coded nodes that belong to members of the English court, most of James Stewart’s connections are to orange nodes.  The density of these same-colored nodes is not a coincidence; it is a product of a clustering algorithm that groups nodes together based on their shared connections, and assigns a different color to major connected components.  James Stewart is therefore primarily connected to this dense group of orange – Scottish – nodes, with a few outlying connections. 

When we shift our focus to the James VI node, we maintain connections to two of the three other James nodes – historians, apparently, are unlikely to refer to James as both James I and James VI in a single biography.  James VI maintains connections to many, though not all, of his original orange nodes, while picking up a large number of additional nodes in orange, yellow, and purple.

There are obviously further errors persisting in this dataset – James was dead long before the birth of his grandson, Charles II, not to mention William Pitt or Samuel Johnson.  However, it is interesting to note that James VI, unlike James Stewart, appears in the same biographies as important members of the English court in both the sixteenth and the early seventeenth centuries (yellow and purple, respectively). 

An even more striking shift appears when we look to the James I node.  Here, James I has lost the vast majority of his connections to the orange-coded nodes and instead picked up a host of associations with very English purple nodes.  Indeed his node itself has become purple, reflecting his ascension to the English throne in 1603 and thus his “rebirth” – according to NER, at least – as James I.

This dramatic shift has led the SDFB team to speculate about the disjunction between the English and Scottish courts.  Did James’ transformation from James VI of Scotland to James I of England unify the two, or did they remain two relatively distinct social networks that were only joined by a few people who made the journey between Edinburgh and London?  How much of this striking visualization reflects the realities of the early seventeenth century and how much the conventions of historians discussing the early modern period?  We don’t yet know, but it is one of the many questions we’re hoping the SDFB network will eventually be able to answer.

The last image I want to show you is the King James node, stripped of any Roman numerals.  Again there are a few obvious errors, as the NER cannot distinguish between James VI/I and James VII/II, however the majority of references to King James appear to have referred to the first (orange) James, not his (pink) grandson.

In this King James, unlike his Roman numeraled alter egos, we see the node colors shifting towards an equilibrium.  He maintains connections with a large number of the orange nodes that were associated with James Stewart and James VI, while also keeping touch with the purple nodes of James I.  There is one very prominent node missing, however: King James and Elizabeth I no longer have any connection.

What Thursday’s referendum will do to the connections between Edinburgh and London, however, we’ll have to wait to see.

Six Degrees of Francis Bacon and Undergraduate Research Part I

Christopher Warren (http://orcid.org/0000-0002-9881-682X)

How might a web application like Six Degrees of Francis Bacon be used for undergraduate teaching?  What might undergraduate students discover by thinking and learning with SDFB?

Last spring, I worked in an experimental learning setting with five thoughtful, adventurous undergraduates in Carnegie Mellon University’s Dietrich College of Social Sciences and the Humanities.   We came together as part of something called a Research Training Course, part of a broad Dietrich College initiative intended to bring exceptional first- and second-year humanities undergraduates closer to the cutting-edge of academic research.  The students responded to a course description circulated by the Dean’s Office that emphasized an innovative learning environment focused on reconstructing the social network of Great Britain in the years 1550-1700.   None of the students had previously studied early modern Britain in a university setting.  They received nine course credits for their participation.   

Throughout the semester, the students worked with an interactive web application capable of producing network visualizations for roughly 6,000 individuals who lived in early modern Britain. A screenshot is included above.  (Additional screenshots from the alpha web app will appear in coming posts -  development is ongoing). While the students were invited to explore the network according to their particular interests, a foundational task was to research relationships depicted in the visualizations and to evaluate their validity.  It mattered decisively to the experience that not everything presented to them in the network visualizations was empirically true.  

In addition to researching the history and culture of early modern Britain, the students reflected on their collaborative digital learning process in some remarkably fruitful ways.  Over the next month or so, we’ll be dedicating some space on the blog to their written reflections—reflections that we’ll be posting in installments, with the ultimate goal of possibly turning the series into a publishable essay.    

The students’  work together was highly collaborative.  As such, the writing you’ll be seeing in this series is highly collaborative too.  It is officially the work of students Jordan Cox, Emmett Eldred, Alexandra George, Sarah Hogdson, Rebecca Smith, and occasionally myself, as the course instructor.  But as befits the students’  shared interest in networks, few of us can identify where exactly one student’s contributions end and another’s begin.  Each of the ideas, phrases, examples, and facts in this series—even the overarching organizational structure—was introduced by one of the group somewhere along the line, but everyone refined and ratified each others’ labors throughout.  Six Degrees of Francis Bacon is networks all the way down.  

Some of the upcoming topics include: the roles and processes students developed to cope with the unique challenges presented by our idiosyncratic course; the importance of error and imagination in historical thought; the ways social networks challenge persistent notions of solitary genius; and the heightened importance of historical investigation when students perceive that historical truth really is at stake.     

We look forward to sharing the students’ reflections in coming posts.

What Is A Data Curation Fellow?

Jessica M. Otis (http://orcid.org/0000-0001-5519-8331)

As those of us in the US gear back up for the new school year, I thought now might be a good time to write a blog post introducing myself and – more importantly – my position to the SDFB community.  So hello, everyone!  My name is Jessica Otis and my research focuses on the history of popular mathematics in early modern England.

I’m also the new CLIR/DLF Early Modern Data Curation Postdoctoral Fellow for SDFB.

The operative part of my title is “Data Curation” and that’s what I want to focus on in this blog post.  One of my main responsibilities will be to provide data curation services for the SDFB project, including data management planning, metadata generation, and crowdsourcing oversight.  For many people, “data” is a strange word to hear in an humanities context.  So what does “data curation” mean?  And why is are these three elements of data curation important to SDFB and the humanities more generally?

Data Management Planning

Data management planning is something that scientists and social scientists still have to deal with more than humanists, although that’s changing.  For most humanists, the “data” that forms the foundation for our research are physical, not digital, objects.  We work with textual artifacts, such as court records or novels, as well as more material artifacts ranging from bricks to textiles to skeletons.  But no matter how physical our original sources, we are all are increasingly functioning in a digital world.  Thus humanists must also develop methods for electronically “managing” our data – determining how to collect it, process it, index it, store it, and preserve it for future scholarship. 

Many of us are already collecting, processing, and indexing our data using digital methods.  During my own dissertation research, for example, I used a digital camera to photograph historical documents, created Word files of notes and transcriptions associated with the digital images, and used Excel spreadsheets to keep track of everything.  Many historians I know have similar processes.  Storage and preservation can then range from keeping hard drive backups to placing the data in an online repository.

No matter what storage method you choose, the best practice is to choose a secure, off-site location for a backup of your data, in case a natural disaster takes out your computer, your house, or your entire city!  Off-site storage of your data can be as simple as keeping a spare hard drive at a relative’s or in a safety deposit box.  However there are also a number of options for online repositories.  These include commercial services such as DropBox and university-supported services such as UCSD’s Chronopolis or UVA’s Libra

We are also beginning to see the creation of large numbers of discipline- or subject-specific repositories, such as the UK’s History Data Service.  For a centralized list of many such repositories, see Databib.  You’ll notice humanities repositories are drastically outnumbered on Databib, but more repositories should begin to appear as the demand for such services increases.

For bigger projects, curating data obviously requires a bit more work than for individual researchers.  This is particularly evident when we look at the long term storage and preservation of large amounts of data that aren’t the preserve of a single individual, such as the relationship probabilities and typology data being compiled by the SDFB project.  What is going to happen to our data five, ten, or even fifty years down the road?

One of the ways we plan to address this question is to make our data as “open access” as possible – that is, anyone will be able to come to the SDFB website and download our data for their own use.  If there are copies of our data on hard drives all over the world, the chances of a catastrophic data loss will be dramatically reduced.  Or, as they say in the preservation world, LOCKSS: lots of copies keeps stuff safe.

Open access is not a one-size-fits-all solution for preserving data, but in the case of SDFB, I believe this is something we owe to the crowd-sourcing community we hope to build.  Our data will belong to them as much as to the project’s creators.  By making our data open to the community, other scholars can download our data and hopefully find new, exciting ways to analyze it in the years to come. 

Metadata Generation

Metadata is “data about your data,” which can be embedded directly into many types of files.  One of my favorite examples of this is the geographical tags that are attached to most digital photos.  It’s not part of the actual photo that people see, but someone who’s interested in creating a world map of cat photos (seriously: http://iknowwhereyourcatlives.com ) can look at the photo metadata and find out where the photo was taken.  More usefully as a professor, faced with students *swearing* they had the paper done on time, programs like Word often put a timestamp in a document’s metadata that indicates the last time the file was modified.

Metadata is useful for research, as well.  Ever go back to research notes from a few years ago and wonder what in the world you were trying to say in a cryptic few lines?  Frustratingly, you end up trying to read the mind of your younger self.  This is a problem caused by a lack of metadata and it’s magnified exponentially when you’re trying to read the mind of another scholar.  You can download the SDFB data all day long, but it won’t be useful if you can’t make heads or tails of what’s actually in the files you downloaded.  Part of my job, then, is making sure SDFB has metadata in place to help other people understand the data behind our network visualizations.

Crowdsourcing Oversight

One of the most exciting moments of this project will be when the new website goes live and the SDFB community is able to begin expanding on our original network research.  As powerful as computers are, they’re still no match for the human brain when it comes to imprecise, inconsistent, and incomplete data – all of which are unfortunately common in the early modern period.  Nor is there an algorithmic way to easily classify early modern relationships without the knowledge base that humanist scholars acquire as part of their training and research.

Crowd-sourcing requires oversight to make certain the data – information about people and their relationships – that users add to our network are valid.  We are also working with our programmers to create new features for the website that will enable users to engage in scholarly debate about the data, analogous to the comment section on a blog post or to a Wikipedia talk page.  These shared spaces will require oversight, as well, and strategies will need to be developed to indicate when there is no community consensus regarding certain data in our network.

This is a different type of data curation, but is nonetheless vital to the long-term success of the SDFB project.  And I feel safe in speaking for the whole SDFB team when I say that we are hoping for this to be an active, expanding community of knowledge for years to come.  So thanks for welcoming me to the community and I look forward to building great things with you.

On Categories of Relations in Networks: or, Most Abstract Blog Post Title Ever?

Dan Shore (https://orcid.org/0000-0001-7073-3208)

Any project that sets out to map a social network - a network of relations between persons - will need to decide how to represent and categorize the types of relationships between people - the various ways that people are associated with one another.  The big decision will be a matter of choosing between a controlled vocabulary (we pick a limited number of relationship types in advance) and an uncontrolled vocabulary (users can add new relationship types without restrictions).  Some of our initial thinking about this significant decision can be found in this podcast (especially around the 22-minute mark), but this post is intended to survey the advantages and disadvantages of these choices more fully.

The advantage of a controlled vocabulary of types is that all relationships can be sorted, searched, and ordered by finite categories chosen in advance.  There’s no danger of users adding redundant or specious types or of unseen overlapping hierarchies.  Presumably two nodes can be connected by more than one relationship type (one person could be an uncle and a trade master and a guardian of another), so that one way to achieve specificity is by layering multiple relationship types.  But the downsides of controlled vocabularies are even more glaring.  They are inherently tendentious.  One can always ask why some relationship (teacher and student? Father and “natural” (i.e. born out of wedlock) son?) is omitted while others are included.  By the same token they are inherently normative, giving recognition to some relationships but not others.  They universalize, imposing relationship types across different periods and communities.  A Controlled vocabulary will normalize and universalize regardless of how carefully one assembles them, subjecting them both to historical critique (are their terms as appropriate in 1500 as in 1700?) and localist critique (do the same types apply in rural communities as in urban ones? in the north as in the south of England?).  Historians and literature scholars notice and care about these problems and will (should?) bridle at the constraints they place on their ability to characterize connections between people.  Worse still, controlled vocabularies standardize as matters of fact precisely the things that historians and literature scholars treat as central matters of concern and debate.

You can see the problem of controlled vocabularies most vividly in any popular social networking site.  I may wish to be connected to someone on Facebook but think it imprecise or even absurd to identify her or him as my “friend.”  This ends up changing the very definition of the term “friend,” making it into the general type of social relations qua relations, rather than one particular type of relation among others.  This problem of imposing relationship categories is exacerbated when we’re aiming to reconstruct the network of a period distant in time and culture from our own.

So why not just use an uncontrolled vocabulary of relationship types?  Why not let users characterize relationships with unconstrained subtlety, detail, and specificity?  The disadvantages of an uncontrolled vocabulary of types is are roughly the negation of its advantages.  Uncontrolled types can proliferate endlessly, making them nearly useless for searching, sorting, filtering, or ordering.  If, as Aristotle observes, there is no science of particulars, only of categories, then dispensing with categories also dispenses with the science, leaving only the proliferation of disparate, particular relations.  Since an uncontrolled vocabulary is not shared between members of a community (you have your preferred types, I have mine), this means that the community lacks a shared set of categories for querying or analyzing the network - or at least, overlap in categories will be the product of local and fleeting agreement.  Without a controlled vocabulary of relationship types, it wouldn’t be feasible to filter the network to display only persons related through “Family” or through “Profession,” since those general categories would be thrown into the mix indiscriminately with more specific categories like “Step-Son” or “co-Member of Parliament.”  Put simply, an uncontrolled vocabulary of relations would negate many of the practical benefits for which we’ve decided to reconstruct the social network in the first place.  

That said, I believe that we (the Six Degrees team) have already decided, of necessity, to use an uncontrolled vocabulary for nodes, which amounts to letting users tag persons with basically an unlimited range of group membership descriptors.  There’s no way around this because there’s no principled way to decide, in advance of historical inquiry, what kinds of groups an early modern person could have participated in.  The groups in which persons take part change over time, they overlap, and they are debatable (what was the status of the group “Ranters?”) both in their own time and in historical retrospect.  The only option for nodes is to have contributors deploy the fullest range of group types, including both general (“Puritan”) and specific (“Arminian”) tags, and without any attempt to impose hierarchical relations.  Any attempt to enumerate and categorize all of the radical sects of the late 1640s and early 1650s into a taxonomic scheme would, I think, be to repeat the futile project of Edwards’s _Gangraena_.  It would be beset with problem of overlapping hierarchies. For example, Milton could (arguably) be tagged as a Puritan (or “left Protestant,” anti-episcopal, etc.) and as an Arminian, but Arminianism is a subcategory of Anglican as well, even though Puritan and Anglican are, for most scholars, exclusive categories; a classical categorization scheme wouldn’t work.  So no controlled vocabulary and no hierarchy for node types.

How different are relationship types from nodes?  One intuition is that while groups (i.e. Ranters or members of the “Hartlib Circle”) are highly contingent historical categories, some relationship types have validity across periods and cultures.  All periods (so the intuition goes) have notions of what it means to be related by family, even if the kinds of relations that are counted as family relationships vary dramatically between and even within periods and cultures.  In the early modern period, as in earlier and later periods, the notion of a “natural” or “bastard”  or “illegitimate” child occupies a liminal role in the family, inside in some respects or with respect to some family members, but outside in others.  Yet this kind of liminal case, even as it troubles the coherence of the category “family,” at the same time demonstrates its indispensability.  A more current example: our concept of what counts as a family has changed dramatically (and for the better, I scarcely need to say) as a result of the gay rights movement.  People now speak publicly and proudly of same sex spouses, same sex partners, “gaybys” and other relations as family relations in a way that wasn’t the case decades, much less centuries ago.  But this change in the content of the category “family” demonstrates, rather than undermining, the perdurability and generality of the category itself.  (It’s unclear whether anti-normativity and anti-marriage gay theorists would think it possible or desirable to dispense with the normative category of family relations tout court; this is a question worth asking).  

Digital humanities projects (as opposed to DH scholarship) forces us to stop poking at our basic categories somewhere and make a decision.  This halt to fundamental questioning is the thing about DH that makes humanists like me uncomfortable.  Humanities disciplines have taught us to think of ourselves as poking, deconstructing, troubling, and questioning categories indefinitely.  But the discomfort with any halt to questioning is not peculiar to DH.  The decisions required for DH just make it harder to forget that it is only possible to trouble or deconstruct any particular category or set of categories by leaving in place a whole set of background categories and assumptions.  This is as true of radical critique as it is of any digital database.  Total skepticism about categories just isn’t possible - or desirable, since it would mean the cessation of thought, not thought’s highest pitch.  We can trouble anything, but we can’t trouble everything at once.  An advantage of a DH project like Six Degrees of Francis Bacon is that it lets us clarify our background categories in a systematic and visible way, essentially disclosing new objects for critique.  Its relative positivism (it is concerned to record, store, and make systematically available facts about how people were related) need not be opposed to critiques of categoires.  Rather, the project can serve as a basis for further critique.  In the terms of Bruno Latour, we can’t dispense with “matters of fact” if we hope to pursue “matters of concern” (in this case rethinking relationship types).

In that practical spirit, let me propose one possible way forward on the question of relationship types.  Instead of choosing between a practically useful but theoretically indefensible controlled vocabulary, on the one hand, and a theoretically defensible but practically disastrous uncontrolled vocabulary on the other, we should mix the two.  High level, relatively general and perdurable categories of relations – like family relations, work relations, or pedagogical relations – would be controlled.  These would, for example, allow users to search and sort and filter all Royalist nodes connected by family relations.  But the lower level types, which would be sub-specifications of the higher levels, would be uncontrolled.  That is, we would leave scholars/users free to elaborate the types of familial relationships without constraint, even if this makes it harder to filter and sort coherently at the lower level.  We would have a split-level hierarchy, one that (as with node types) would put no constraints on overlapping hierarchies (Master and Apprentice could be classed as both a pedagogical and a professional relation).  The network would support debates about the family category by enabling debates over the specific kinds of relations that fall under the general category of family relations.  This proposal offers a practical and technical compromise (and by compromise I mean not wholly satisfactory in any respect) to a fundamentally theoretical - i.e. conceptual and ontological - question: what kinds of relations between people are there? 

"We live in an Elizabethan world of our own reductive devising, populated by the Queen and Ben Jonson and the Dark Lady and the Bard and a theatre full of groundlings. But the real Elizabethan world had a lot more people in it than that."

—   Adam Gopnik, “The Poet’s Hand

PODCAST: Christopher Warren on Six Degrees of Francis Bacon

SDFB co-PI Christopher Warren recently presented in Oxford University’s Cultures of Knowledge seminar series “Negotiating Networks.”

The podcast of his presentation, “Bacon and Edges: Reassembling the Early Modern Social Network,” can be found here.

BLOG POST: Daniel Shore on “Extensions of the Book”

SDFB co-PI Daniel Shore has written a guest blog post at the Folger Shakespeare Library’s blog, “The Collation.”  

His post, “Extensions of the Book,” can be found here

PODCAST: Ruth Ahnert and Sebastian Ahnert on “Tudor Letter Networks: The Case for Quantitative Network Analysis”

SDFB team members Ruth and Sebastian Ahnert recently spoke in Oxford University’s Cultures of Knowledge seminar series “Negotiating Networks.”

The podcast of their presentation, “Tudor Letter Networks: The Case for Quantitative Network Analysis,” can be found here.  

Job Opportunity with Six Degrees of Francis Bacon: Early Modern Data Curation Fellow

Carnegie Mellon University’s Department of English and University Libraries jointly seek an Early Modern Data Curation Fellow to lead data curation activities for the Six Degrees of Francis Bacon (SDFB) project, a digital reconstruction of the early modern social network that scholars and students can collaboratively expand, revise, curate, and critique. The fellow will leverage expertise in early modern studies along with technical aptitude in order to contribute meaningfully to a rich data lifecycle, including collecting, processing, textmining, analyzing, and archiving data related to the early modern social network. 

Click on the links above for further details. 

Request for Input

We’re very busy working behind the scenes on an interactive user interface for Six Degrees of Francis Bacon (SDFB).  Since we want SDFB to be as useful to the scholarly community as possible, we thought we’d open up the process a little.  


image


Have a look at a couple recent screenshots, centered on Thomas Hobbes and Ben Jonson, respectively.  Say you could learn more about the line of connection between two people - in technical network parlance, the “edge” connecting Hobbes and Waller or Jonson and Chapman.  What kind of information would you want to find there?    

Say, moreover, that you could add or edit information about that edge. What kinds of information would you want to add or correct?  Likelihood of relationship? Type of relationship? Strength of relationship? Dates and frequency of contact? Bibliographical sources?  

Let us know with a tweet to 6Bacon or by clicking “Submit” above. 

image

 

Elizabethan Hacks


We’re delighted to share the reflections of Matthew Harrison, a graduate student at Princeton University. His dissertation, “Tear Him for His Bad Verses: Carping, Cavilling, and the Origins of Criticism,” explores the strange and wonderful vocabulary with which Tudor readers insulted, sniffed at, and objected to the shortcomings of particular poems and the art itself.

Many thanks to the 6 Degrees team for letting me play around with their data.

I requested the network data for three Elizabethan writers: Nicholas Breton (1554/5–c.1626), Barnabe Barnes (1571—1609), and Robert Greene (1558-1592). Though rough contemporaries in the burgeoning London literary scene, each imagines and constructs a career as a print author in somewhat different terms, taking very different relations to print and patronage. I wanted to know how much of the different writerly trajectories of these figures would be visible in their social networks.

First, a rough sketch of the three figures:


Greene, famously, is among those now known as “University Wits,” humanistically-trained writers who made careers out of writing for publication and the stage. The success of his pamphlets and prose romances made him, according to the Dictionary of National Biography, “England’s first celebrity author,” and the appeals to patronage of his early work disappear in favor of efforts to capitalize on his reputation.

The third son of the Bishop of Durham, Barnes seems to have lived on his inheritance while trying to fashion himself as a fashionable literary gentleman. (Thomas Nashe mocks him for showing up at court in “a strange pair of Babylonian britches, with a codpiece as big as a Bolognian sausage.”)

Finally, Breton, the son of a wealthy merchant, was a prolific writer of devotional and secular poetry and prose. His dedications approach a wide swath of potential patrons—city officials, country dignitaries and luminaries at court—with works that seem targeted to their taste and interests.

image
Image: Greene, in his winding sheet, imagined as writing from beyond the grave. From the title page of Greene in Conceipt, by John Dickinson

For ease of reading, I’ve put the top twenty or so results for each writer at the bottom of the page. These lists, while idiosyncratic, are fairly accurate: above the .5 confidence interval, the algorithm tends to be right that two people are connected (though, as we will see, what comprises a connection is an open question). One other encouraging result: the confidence intervals for Greene’s relationships are significantly higher than those of the others, while those for Barnes are quite low. With good reason: Greene’s DNB entry is five or six times longer, and a full-text search suggests he’s mentioned about that much more often in other entries. The algorithm is most confident when it has a larger sample with which to test its conclusions. The more often two names appear together, the more likely a relationship can be inferred. (I’ll raise a few problems with that inference in a moment.)

With this general background, I want to raise three challenges for thinking about and using Six Degrees data.

Did Robert Greene Know Philip Sidney?


With a confidence interval of  of .83, the network suggests Greene knew Philip Sidney. A connection is possible— the two are contemporary, though they move in very different circles. And, indeed, Greene’s entry mentions Sidney six times. Yet each mention denies literary influence, claims such as the following: “Greene independently synthesized the same models as Sidney had…” or ” But Greene probably knew Sidney’s romance only indirectly…”. The DNB entry author takes pains to distinguish Greene’s mode of prose romance writing from Sidneian influence.

Should this be considered a connection? In terms of social network and literary influence, respectively, Breton and Barnes are each closer to Sidney. Yet Sidney’s prominence and their generic similarities mean that Greene’s work has been read in Sidneian terms since the 1590’s. Is that sufficient? It depends on the purpose for which we are using the database.

A related problem: Breton is connected to Henry Machyn, a parish clerk, because Machyn’s Diary is used to provide a colorful anecdote about his step-father. Indeed, I suspect the network data for Machyn would tell us far more about which entries draw on this source than about Machyn himself. (We can find the same sort of error in relations to Francis Meres, John Bodenham, Thomas Moffet, Robert Dow, Anthony Wood, and other individuals connected to important historical sources.)

Again, these complications tell us less about the shortcomings of the project than about the complexity of human relationships and their irreducibility to a network graph. Hence the Six Degree team’s emphasis on introducing multiple data streams and on allowing domain experts to tag and annotate relationships.

John Weld and Isaac Newton: Artifacts and Duplicates

One place where human intervention will greatly improve results comes in recognizing artifacts of the way the algorithm derives names from the DNB data.

Barnabe Barnes, you’ll notice, is connected both to “Thomas Nashe” and “Thomas Nash.” These aren’t two different individuals; it’s the same fellow, with his name spelled two different ways.  (If I had included a little more data, you’d see Archbishop Matthews’ first name spelled both “Tobie” and “Toby.”) Making matters worse, in other entries, Barnes’s own name is sometimes spelled “Barnaby.”

On the other hand, the network suggests that Robert Greene knew Isaac Newton. Perplexing, until one realizes that it doesn’t distinguish between Robert Greene (the 16th-century writer) and Robert Greene (the 17th-century natural philosopher).

My favorite example is Breton’s proposed connection to one “John Weld.” There is no such person in the DNB meeting the five mention threshold for inclusion in this data set, so his appearance seems anomalous. Until a full-text search reveals that five different people named “John Weld” are mentioned once or twice apiece. The network amalgamates a 20th-century critic, a 17th-century sheriff, a cleric, and so on.

City, Country, and Court

The DNB favors people who have been deemed notable, while the 6 Degrees project threshold for inclusion restricts the data set even further. The resulting networks are male-dominated and overly emphasize weak ties to key political and literary figures over the everyday sorts of sociability that might have governed lives in the period. The given networks lack, for example, the city figures to whom Breton dedicated much writing.

Still more drastically, we don’t hear of the many servants, innkeepers, shop-owners, and apprentices that far outnumbered the “notables”: how different will these networks look, I wonder, when they can be supplemented with data from court cases, the State Papers, and college records?

The inferences the algorithm can already make are impressive. Thus in 1598, Barnabe Barnes tries twice (rather amateurishly) to poison John Browne, first poisoning a lemonade he gives him and then his glass of wine. Barnes is caught, tried by Sir Edward Coke, and ultimately gets off easy, presumably through intervention of some aristocratic connection.

The court records survive, so we can piece together quite a bit of social life in Elizabethan London: we even know the name of the two servants who are made to test the poisoned wine and become grievously ill. The Six Degrees project doesn’t have access to this data. Indeed, as I pointed out above, it doesn’t have much to work with regarding Barnes at all: about a dozen names in his entry.  What it does know, however, are the associates of Barnes’s father, the Bishop of Durham. We find a number of Richard Barnes’s connections suggested (with low confidence intervals) as possible contacts of Barnabe’s. And as I researched the aftermath of the trial, I was surprised to find that some of the same names come up, as people to whom the younger Barnes had recourse. And I wonder whether more historical research might not validate more of these connections.

Its guesses, we might say, happen to have been validated by history. This is part chance, of course, but it also reflects a deep parsimony in human relations. If serendipity and error make it difficult for network analysis to completely capture the complexity of human social relations, nonetheless, often enough we know who you might guess we know. Though far from complete, a network like this one is an excellent place to begin.

image

Boundaries, brevity and bias: Protestant letter networks

image

A few days ago we posted an update about the project, including the fact that Ruth Ahnert and Sebastian Ahnert had recently joined the Six Degrees of Francis Bacon (SDFB) team. The following post, by Ruth, examines how her recent work with Sebastian on Protestant letter networks, and the methods they have developed through this process, inform the project. 

Sebastian and I were invited to join the SDFB team after Chris Warren and Dan Shore saw me presenting on our research at the MLA conference in January. What piqued their interest, I believe, was the way that our approach contrasted with, yet complemented, their own. While the existing SDFB team have been constructing a global network inferred from secondary literature comprising 6,000 nodes (and growing), ours was a verified local network comprising just 377 nodes, derived from primary literature. The source in question was a collection of 289 letters written in the reign of Mary I by Protestants which were later collected, and many brought to print, by the famous martyrologist John Foxe.The other difference between our work and theirs was that our emphasis was not so much on network visualization as on topographical network analysis. In our work we have employed mathematical tools and algorithms to measure the relative centrality of each of the individuals in the network, and we have begun to develop ways of predicting the roles of different actors according to their network properties.

image

(Figure 1 Large Version)

We are now moving onto a new, much larger collection of documents - Letters and Papers, Foreign and Domestic, Henry VIII, and The Calendars of State Papers. Our mining of these sources will create a very large sub-network that will offer two things: 1) additional associations missed by the (necessarily) coarse-grained nature of network inference, and 2) a means of verifying and improving the inference process. It will take quite some time before this verified sub-network is reconstructed, but to suggest the way this might work, we decided to write a couple of blog posts comparing the Protestant letter network to the equivalent portion of the global network. In this post we will provide an overview of what a comparison of these two networks can offer; in a second post we will undertake some more quantitative analysis.

image

(Figure 2 Large Version)

Figure 1 (created by Sebastian) is a visualization derived from 289 letters found in John Foxe’s papers and publications and sent between Protestants in England during the reign of Mary I. Figure 2, by comparison is a section of the inferred network derived from the Oxford Dictionary of National Biography (ODNB), which was generated by running the network inference process for the those actors in our Protestant network that have their own ODNB entries. Straightaway two key differences between the parameters of these 2 networks can be observed: the network in figure 1 is bounded by time (communications made 1553-1558), and its actors’ membership within the Protestant community, whereas network 2 covers the whole lives of those Protestants, and consists of all associations that have been inferred with a confidence that exceeds a given threshold. This is the nature of working with collections of primary documents: they are circumscribed by various collection policies. For this reason, network 1 excludes figures like John Story, who, despite being in written communication with various Protestant leaders in the reign of May I, was an opponent of the Protestant community as one of the key enforcers of the Marian religious reaction. Based on these two additional layers of circumscription acting on network 1, it should look much smaller. Yet it doesn’t, even when we take into account some of ‘false positives’ included in network 2, such as the historian and biographer John Strype (1643–1737), who only wrote about the Protestant martyrs, and William I. Why is this?

There are three main reasons: the brevity of ODNB entries, their bias, and the confidence levels used in the network inference process. Considering first the issues of brevity and bias: ODNB entries are concise introductions to the biographies of figures perceived to be significant in the nation’s history. This means that there is only space to mention their ‘important’ relationships and interactions. However, religious leaders did not only write to other high profile figures in the Protestant movement, and, as a result, there are a significant number of figures that appear in network 1 who do not have their own ODNB entries; some are only mentioned in passing within others’ entries, and others still are not mentioned at all. Significantly, many of the figures who fall into this category are women, for while women play a minor role in both historical and modern scholarly accounts of the Reformation, they often occupied vital infrastructural roles within this network, funneling letters, money, food and other goods to the London prisons to support their incarcerated co-religionists. These infrastructural figures are highlighted by two key network measures: “betweenness” (which measures the number of times a shortest path across the network goes through a given node) and “eigenvector centrality” (which is closely related to the algorithm used by Google to assign importance to web pages in the World Wide Web, and to rank its search results by relevance). Network 2 demonstrates how the need for confidence levels when inferring a network exacerbates the structural bias against women in the ODNB: because they are mentioned rarely, they often fall below the radar. However, the combination of primary sources and network analysis has the ability to provide a more gender-balanced view of the early modern social network.

While the relative absence of women from network 2 might have been expected, there are some other more surprising absences, which highlight the problem with using the ODNB alone for network inference. In network 1 John Careless is shown to be an important hub. This is because he sent a lot of correspondence, and so accordingly he is given significant coverage in John Foxe’s famous ‘Book of Martyrs’. Yet the ODNB does not have an entry on him. Bartholomew (or Bartlet) Green, a prominent martyr in Foxe’s book, is another significant absence from network 2. This is because although he does have his own ODNB entry his relationships are given low confidence levels because his name does not appear in anyone else’s entries.

The absence of nodes such as the female sustainers, Careless, and Green has a knock-on effect for our understanding of the topography of this network as a whole. It means that important relationships and channels of news transmission are not represented, which also prevents us from seeing how highly connected certain Protestants were. For example, in network 1 John Bradford appears right at the centre as the most connected node, with links to 67 different people through the numerous letters he wrote. By contrast in network 2 he does not appear to be especially connected, and he is nowhere near the middle, which shows the inability of the inference methods to pick up on his notable evangelism, and the important relationships with carriers like Augustine Bernher and William Punt, and sustainers like Margery Cooke and Joyce Hales, that made this possible.

A comparison of these 2 networks, then, provides a number of lessons for the SDFB team about the reliability of the source material and our current inference models. Happily the former issue is being addressed as we turn to a broader range of primary and secondary literature; and as more anomalies are spotted and understood we hope to find ways to improve the way we assign confidence levels. But there are also broader lessons for anyone working on networks. When you construct a network you are always necessarily making decisions about where your boundaries are. This comparison shows clearly that choosing to include more or less information can radically change your findings.

 

Probability, Validation, and Method in Six Degrees of Francis Bacon

Six Degrees of Francis Bacon (SDFB) is at an important juncture in the project as we move onto the second phase, which brings on board two new members: Ruth Ahnert (Queen Mary, University of London) and Sebastian Ahnert (University of Cambridge). Their ongoing collaboration - which explores the application of quantitative network analysis to the study of sixteenth-century letter collections - will bring both new data and skills to our reconstruction of the early modern social network.

Up to this point, the team has been mining a single source, the Oxford Dictionary of National Biography (ODNB), to produce a preliminary list of 6,000 actors and to infer, with confidence estimates, a map of the associations between them. Links are inferred through data-mining, by chunking articles and measuring the co-occurrence of names. Confidence estimates arise through re-sampling.  Out of 100 runs through the documents, how well does the presence of one name predict the presence of another? Using this data, we have already created one large data visualization mapping all these inferred associations (‘Global Graph, featured on the blog 14 July 2013), as well as smaller ego networks focused around single figures, such as Margaret Cavendish or James Harrington. However, these inferred networks are coarse grained, biased to one source, and lacking verification. The next phase requires us to broaden the scope of our sources to extend and verify these inferred relationships, and to explore the ways that we can analyse the network we have reconstructed. 

In the coming months we will expand the range of primary and secondary texts from which we derive relationships in order to reconstruct the network more fully and accurately. Hathi Trust Reseach Center and Google Books have supported our work by providing non-consumptive access to texts in their archives, and so we will be expanding our inferred network using secondary literature from these two vast repositories. On its own, however, this approach only produces a set of probable links with confidence estimates. Inference algorithms produce some false associations and omit other true ones. This is why, at the same time, Ruth and Sebastian will also be constructing a validated sub-network from the Letters and Papers, Foreign and Domestic, Henry VIII, and The Calendars of State Papers foreign and domestic, which they have been granted permission by British History Online to access via their site. This will allow us not only to confirm and supplement the inferred network; it also provides a sizeable set of verified associations against which to validate the inferred network so that we can examine the technical process and to tweak the methods applied to derive data.  

Comparing the two networks will also alert us to systematic biases in secondary literature – some of which have already become apparent from initial case studies, such as the relative absence of women (see ‘An Entry of One’s Own, or Why Are There So Few Women In the Early Modern Social Network?’ 8 March 2013). Another important bias has been pointed out in Shawn Moore’s guest post on Margaret Cavendish’s network. He writes:

But with few exceptions, the majority of nodes are people with whom Margaret had little to no direct contact. Samuel Pepys and Dorothy Osborne, with .65 and .80 estimates respectively, are not known to have interacted with Margaret at all, though their impressions of her and her work still dominate traditional representations of the early modern writer.

This is both a problem with the nature of secondary literature, which places emphasis on how Cavendish was received by early modern writers who may well not have known her personally, and on the automated nature of the data-mining, which picks up these names as people likely to have known Cavendish because of the frequency with which they appear in close proximity to her name. Conversely, however, Moore points out that in addition to these false positives, there are some known relationships that have not been picked up through inference. This is due to the need to set thresholds high enough to filter out statistical noise – names that appear together once only are more likely to give raise to a false positive. But as a result we necessarily miss the fine grain.

The primary sub-network that Ruth and Sebastian are working on, then, aims to fill in these gaps. But Shawn’s contribution here is also instructive. Our ultimate aim is to have a whole host of experts like Shawn overseeing additions and modifications to the network, which will be editable to invited collaborators through a wiki interface; other scholars will be able to make suggested modifications which will be checked and implemented by these curators. There is still a long way to go before this is a reality, but through collaboration, and a growing team and support network of advisors, these aims are closer to realisation.

Global Graph

image

Above is a thumbnail from a large network visualization produced for SDFB by the talented folks at KNALIJ.  Click here to view the whole image, which is large (~12mb) but can be zoomed and navigated using your web browser.  The image includes only the top nodes and edges in our inferred network.  For a rather unwieldy visualization of all 6,000 odd nodes and their edges, without labels, click here.

The proximity of the nodes is determined by their connection strength.  If multiple nodes are all connected with a high degree of confidence, they will cluster together.  So, for example, you can see members of the Elizabethan court clustered in the bottom left hand corner.  The graph takes the shape of a circle because it’s what’s called a force-directed graph, in which links or edges are treated as springs whose stiffness varies based on confidence estimates.  It’s as though the nodes and their connections had been compressed, then left to settle in place in accordance with Hooke’s law for springs and elasticity.  Node size is a function of the number of connections, which is why a figure like Charles II is significantly larger than, say, Samuel Palmer.  The color of the nodes is an indication of community.  Nodes are members of the same community when they share a set number of edges with other members of the community.  When a node is part of multiple communities, its color is determined by the community with which it shares the most edges.         

What is remarkable about the image, from our perspective, is how much meaningful information it displays given the relatively sparse dataset on which it is based.  All that we sent to KNALIJ was a matrix of nodes and edges with confidence intervals.  But from this minimal data their clustering and community inference algorithms have inferred a remarkable amount. 

For example, though our data includes no dates or other temporal information, the graph has an obvious, though not entirely consistent, chronological organization.  Starting with the Elizabethan court in the lower left hand corner, the graph proceeds counter clockwise through the reigns of James, Charles I, and Charles II.  Nodes at 12 noon are largely post-Restoration and/or 18th-century.  Nodes at 10 o’clock are part of James’s Scottish court.

At first we wondered why the center of the graph is basically empty.  Then we realized that to occupy the center, a node would need to share edges with communities stretching over 150 years.  The empty center is, in effect, a sign of the temporal scope of of our network.  Presumably a network stretched over a longer time period would have an even more pronounced doughnut hole.

It’s worth at this point acknowledging some of the embarrassing things that this image makes evident about the current state of our inferred network.  We still have some named entity recognition problems.  The “Society of Antiquaries” should not show up in our network.  There’s more work to do on date limitations, since figures appear from both much earlier (King John) and later (Lloyd George) than our proposed date range of 1550-1700.  As we’ve discussed in earlier posts, there are still de-duping problems, especially with regards to monarchs.  Some of these should be simple to iron out: King James and James I should not have separate nodes. 

But in other cases the duplication provides potentially significant information. Even though James VI of Scotland became James I of England, it is fascinating to see different communities and networks surrounding the two names.  Nor, to scholars of the period at least, is it self-evident that James VI of Scotland and James I of England should be treated as the same person.  As Jenny Wormold asked long ago, James VI and I: Two Kings or One?  

King James’s northern and southern subjects shared one attitude: both treated this man, who embarked on his dual role three months short of his thirty-seventh birthday, as their king, dividing him as far as possible into two separate individuals.

At stake in the question “Two Kings or One?” is the category of Britain itself.

In some cases the use of colors to indicate communities shows fascinating breakdowns in social coherence.  A light blue Elizabeth I is surrounded by a sea of relatively unbroken light blue Protestantism.  But the pink Charles I is cut off from his Laudian community and hemmed in by the darker blues of Cromwell, Fairfax, and Henry Vane.  Henrietta Maria appears to have her own small and dispersed community, set apart from the rest of the mid-century milieu, that is more closely connected to the courts of Charles II and James II, and doubtless to the court in exile starting in 1644.

As tempting as it is to turn these images into narrative, it would be unwise to draw any strong conclusions at this point.  We can’t be sure which aspects of the visualization are artifacts of our highly imperfect network data, or of the arbitrary thresholds with which the visualization algorithms organize that data into a coherent image.  Revised data, or different thresholds (particularly thresholds manipulable by users), could and doubtless will yield very different pictures.

That said, we see the filtered SDFB graph as a rather large map of  problems.  Why does Henrietta Maria have a community distinct from her husband and those most proximate to her? Why does Jacob Tonson sit so far out to the upper-right hand corner?  Why are certain nodes so evidently out of place?  When and why don’t communities align with proximity, color with clustering?  The problems call out for further explanation, interpretation, and speculation especially by experts with knowledge of the period, of particular figures, or of graph learning and/or visualization.    

Networks as Constructs: The Curious Case of Margaret Cavendish, Duchess of Newcastle (1623?-1673)

Guest post by Shawn W. Moore

The Six Degrees of Francis Bacon (SDFB) team is delighted to introduce here the first in an occasional series in which we invite early modernists with expert knowledge about particular groups and figures to reflect on ways SDFB might aid their own research and on how SDFB’s iterative process could be improved. 

Shawn Moore, a PhD student at Texas A&M University who as curator of The Digital Cavendish Project is developing exciting new methods to study early modern networks, examines our early attempts to reassemble Margaret Cavendish’s associations from the 58,000 DNB entries with which we began.   We sought Shawn’s feedback because of his demonstrated knowledge about both Cavendish in particular and digital methods more broadly. 

We told Shawn, “be completely honest and fully critical.  If there are wrong links do say so; if there are important missing connections, that’s just as important; if there’s something useful that we can’t yet do but that you think would be crucial, let’s hear it.  Again, it would be great also to hear how the graph or confidence intervals might be useful to you and your research, but you should by all means go beyond that as warranted…The point is that this is very much a collaborative, iterative process.”

Shawn Moore’s guest post is below. 

Any social network analysis of Margaret Cavendish, Duchess of Newcastle upon Tyne (1623?-1673) must deal with two important characteristics of the famed writer’s reputation: her infamous, self-proclaimed shyness, and the exaggerated yet continually perpetuated reports of her eccentric megalomania. Margaret’s own textual practices complicate this relationship even further.

In fact, it’s a curious case in regard to Cavendish’s perceived network that she at once downplays her own relationships with her early modern contemporaries, yet at the same time embeds her texts with complex sociable relationships and consciously distributes her texts to the intellectual hotspots of her time. These apparent contradictions and her infamous reputation still fascinate Cavendish scholars. It makes social network analysis particularly interesting because it requires thinking about networks in a different way.  In many ways, her networks are non-traditional in that they often exist outside of and beyond Cavendish herself.

image

Figure 1.

In the visualization produced by SDFB (Figure 1.), we see an evenly spread layout of nodes showing a rather impressive network including famous contemporaries such as Samuel Pepys, Robert Boyle, and Thomas Hobbes. The confidence interval visualization (Figure 2, after the jump.) indicates the reliability of estimated relationships, how likely it is that Cavendish “knew” that person (see Network Inference post on the problems regarding “knew”), based on the DNB data. In this case, with a confidence of .9 we can reliably estimate that Margaret knew William Cavendish, her husband the Duke of Newcastle. But with few exceptions, the majority of nodes are people with whom Margaret had little to no direct contact. Samuel Pepys and Dorothy Osborne, with .65 and .80 estimates respectively, are not known to have interacted with Margaret at all, though their impressions of her and her work still dominate traditional representations of the early modern writer.

Read More