"We live in an Elizabethan world of our own reductive devising, populated by the Queen and Ben Jonson and the Dark Lady and the Bard and a theatre full of groundlings. But the real Elizabethan world had a lot more people in it than that."

—   Adam Gopnik, “The Poet’s Hand

PODCAST: Christopher Warren on Six Degrees of Francis Bacon

SDFB co-PI Christopher Warren recently presented in Oxford University’s Cultures of Knowledge seminar series “Negotiating Networks.”

The podcast of his presentation, “Bacon and Edges: Reassembling the Early Modern Social Network,” can be found here.

BLOG POST: Daniel Shore on “Extensions of the Book”

SDFB co-PI Daniel Shore has written a guest blog post at the Folger Shakespeare Library’s blog, “The Collation.”  

His post, “Extensions of the Book,” can be found here

PODCAST: Ruth Ahnert and Sebastian Ahnert on “Tudor Letter Networks: The Case for Quantitative Network Analysis”

SDFB team members Ruth and Sebastian Ahnert recently spoke in Oxford University’s Cultures of Knowledge seminar series “Negotiating Networks.”

The podcast of their presentation, “Tudor Letter Networks: The Case for Quantitative Network Analysis,” can be found here.  

Job Opportunity with Six Degrees of Francis Bacon: Early Modern Data Curation Fellow

Carnegie Mellon University’s Department of English and University Libraries jointly seek an Early Modern Data Curation Fellow to lead data curation activities for the Six Degrees of Francis Bacon (SDFB) project, a digital reconstruction of the early modern social network that scholars and students can collaboratively expand, revise, curate, and critique. The fellow will leverage expertise in early modern studies along with technical aptitude in order to contribute meaningfully to a rich data lifecycle, including collecting, processing, textmining, analyzing, and archiving data related to the early modern social network. 

Click on the links above for further details. 

Request for Input

We’re very busy working behind the scenes on an interactive user interface for Six Degrees of Francis Bacon (SDFB).  Since we want SDFB to be as useful to the scholarly community as possible, we thought we’d open up the process a little.  


image


Have a look at a couple recent screenshots, centered on Thomas Hobbes and Ben Jonson, respectively.  Say you could learn more about the line of connection between two people - in technical network parlance, the “edge” connecting Hobbes and Waller or Jonson and Chapman.  What kind of information would you want to find there?    

Say, moreover, that you could add or edit information about that edge. What kinds of information would you want to add or correct?  Likelihood of relationship? Type of relationship? Strength of relationship? Dates and frequency of contact? Bibliographical sources?  

Let us know with a tweet to 6Bacon or by clicking “Submit” above. 

image

 

Elizabethan Hacks


We’re delighted to share the reflections of Matthew Harrison, a graduate student at Princeton University. His dissertation, “Tear Him for His Bad Verses: Carping, Cavilling, and the Origins of Criticism,” explores the strange and wonderful vocabulary with which Tudor readers insulted, sniffed at, and objected to the shortcomings of particular poems and the art itself.

Many thanks to the 6 Degrees team for letting me play around with their data.

I requested the network data for three Elizabethan writers: Nicholas Breton (1554/5–c.1626), Barnabe Barnes (1571—1609), and Robert Greene (1558-1592). Though rough contemporaries in the burgeoning London literary scene, each imagines and constructs a career as a print author in somewhat different terms, taking very different relations to print and patronage. I wanted to know how much of the different writerly trajectories of these figures would be visible in their social networks.

First, a rough sketch of the three figures:


Greene, famously, is among those now known as “University Wits,” humanistically-trained writers who made careers out of writing for publication and the stage. The success of his pamphlets and prose romances made him, according to the Dictionary of National Biography, “England’s first celebrity author,” and the appeals to patronage of his early work disappear in favor of efforts to capitalize on his reputation.

The third son of the Bishop of Durham, Barnes seems to have lived on his inheritance while trying to fashion himself as a fashionable literary gentleman. (Thomas Nashe mocks him for showing up at court in “a strange pair of Babylonian britches, with a codpiece as big as a Bolognian sausage.”)

Finally, Breton, the son of a wealthy merchant, was a prolific writer of devotional and secular poetry and prose. His dedications approach a wide swath of potential patrons—city officials, country dignitaries and luminaries at court—with works that seem targeted to their taste and interests.

image
Image: Greene, in his winding sheet, imagined as writing from beyond the grave. From the title page of Greene in Conceipt, by John Dickinson

For ease of reading, I’ve put the top twenty or so results for each writer at the bottom of the page. These lists, while idiosyncratic, are fairly accurate: above the .5 confidence interval, the algorithm tends to be right that two people are connected (though, as we will see, what comprises a connection is an open question). One other encouraging result: the confidence intervals for Greene’s relationships are significantly higher than those of the others, while those for Barnes are quite low. With good reason: Greene’s DNB entry is five or six times longer, and a full-text search suggests he’s mentioned about that much more often in other entries. The algorithm is most confident when it has a larger sample with which to test its conclusions. The more often two names appear together, the more likely a relationship can be inferred. (I’ll raise a few problems with that inference in a moment.)

With this general background, I want to raise three challenges for thinking about and using Six Degrees data.

Did Robert Greene Know Philip Sidney?


With a confidence interval of  of .83, the network suggests Greene knew Philip Sidney. A connection is possible— the two are contemporary, though they move in very different circles. And, indeed, Greene’s entry mentions Sidney six times. Yet each mention denies literary influence, claims such as the following: “Greene independently synthesized the same models as Sidney had…” or ” But Greene probably knew Sidney’s romance only indirectly…”. The DNB entry author takes pains to distinguish Greene’s mode of prose romance writing from Sidneian influence.

Should this be considered a connection? In terms of social network and literary influence, respectively, Breton and Barnes are each closer to Sidney. Yet Sidney’s prominence and their generic similarities mean that Greene’s work has been read in Sidneian terms since the 1590’s. Is that sufficient? It depends on the purpose for which we are using the database.

A related problem: Breton is connected to Henry Machyn, a parish clerk, because Machyn’s Diary is used to provide a colorful anecdote about his step-father. Indeed, I suspect the network data for Machyn would tell us far more about which entries draw on this source than about Machyn himself. (We can find the same sort of error in relations to Francis Meres, John Bodenham, Thomas Moffet, Robert Dow, Anthony Wood, and other individuals connected to important historical sources.)

Again, these complications tell us less about the shortcomings of the project than about the complexity of human relationships and their irreducibility to a network graph. Hence the Six Degree team’s emphasis on introducing multiple data streams and on allowing domain experts to tag and annotate relationships.

John Weld and Isaac Newton: Artifacts and Duplicates

One place where human intervention will greatly improve results comes in recognizing artifacts of the way the algorithm derives names from the DNB data.

Barnabe Barnes, you’ll notice, is connected both to “Thomas Nashe” and “Thomas Nash.” These aren’t two different individuals; it’s the same fellow, with his name spelled two different ways.  (If I had included a little more data, you’d see Archbishop Matthews’ first name spelled both “Tobie” and “Toby.”) Making matters worse, in other entries, Barnes’s own name is sometimes spelled “Barnaby.”

On the other hand, the network suggests that Robert Greene knew Isaac Newton. Perplexing, until one realizes that it doesn’t distinguish between Robert Greene (the 16th-century writer) and Robert Greene (the 17th-century natural philosopher).

My favorite example is Breton’s proposed connection to one “John Weld.” There is no such person in the DNB meeting the five mention threshold for inclusion in this data set, so his appearance seems anomalous. Until a full-text search reveals that five different people named “John Weld” are mentioned once or twice apiece. The network amalgamates a 20th-century critic, a 17th-century sheriff, a cleric, and so on.

City, Country, and Court

The DNB favors people who have been deemed notable, while the 6 Degrees project threshold for inclusion restricts the data set even further. The resulting networks are male-dominated and overly emphasize weak ties to key political and literary figures over the everyday sorts of sociability that might have governed lives in the period. The given networks lack, for example, the city figures to whom Breton dedicated much writing.

Still more drastically, we don’t hear of the many servants, innkeepers, shop-owners, and apprentices that far outnumbered the “notables”: how different will these networks look, I wonder, when they can be supplemented with data from court cases, the State Papers, and college records?

The inferences the algorithm can already make are impressive. Thus in 1598, Barnabe Barnes tries twice (rather amateurishly) to poison John Browne, first poisoning a lemonade he gives him and then his glass of wine. Barnes is caught, tried by Sir Edward Coke, and ultimately gets off easy, presumably through intervention of some aristocratic connection.

The court records survive, so we can piece together quite a bit of social life in Elizabethan London: we even know the name of the two servants who are made to test the poisoned wine and become grievously ill. The Six Degrees project doesn’t have access to this data. Indeed, as I pointed out above, it doesn’t have much to work with regarding Barnes at all: about a dozen names in his entry.  What it does know, however, are the associates of Barnes’s father, the Bishop of Durham. We find a number of Richard Barnes’s connections suggested (with low confidence intervals) as possible contacts of Barnabe’s. And as I researched the aftermath of the trial, I was surprised to find that some of the same names come up, as people to whom the younger Barnes had recourse. And I wonder whether more historical research might not validate more of these connections.

Its guesses, we might say, happen to have been validated by history. This is part chance, of course, but it also reflects a deep parsimony in human relations. If serendipity and error make it difficult for network analysis to completely capture the complexity of human social relations, nonetheless, often enough we know who you might guess we know. Though far from complete, a network like this one is an excellent place to begin.

image

Boundaries, brevity and bias: Protestant letter networks

image

A few days ago we posted an update about the project, including the fact that Ruth Ahnert and Sebastian Ahnert had recently joined the Six Degrees of Francis Bacon (SDFB) team. The following post, by Ruth, examines how her recent work with Sebastian on Protestant letter networks, and the methods they have developed through this process, inform the project. 

Sebastian and I were invited to join the SDFB team after Chris Warren and Dan Shore saw me presenting on our research at the MLA conference in January. What piqued their interest, I believe, was the way that our approach contrasted with, yet complemented, their own. While the existing SDFB team have been constructing a global network inferred from secondary literature comprising 6,000 nodes (and growing), ours was a verified local network comprising just 377 nodes, derived from primary literature. The source in question was a collection of 289 letters written in the reign of Mary I by Protestants which were later collected, and many brought to print, by the famous martyrologist John Foxe.The other difference between our work and theirs was that our emphasis was not so much on network visualization as on topographical network analysis. In our work we have employed mathematical tools and algorithms to measure the relative centrality of each of the individuals in the network, and we have begun to develop ways of predicting the roles of different actors according to their network properties.

image

(Figure 1 Large Version)

We are now moving onto a new, much larger collection of documents - Letters and Papers, Foreign and Domestic, Henry VIII, and The Calendars of State Papers. Our mining of these sources will create a very large sub-network that will offer two things: 1) additional associations missed by the (necessarily) coarse-grained nature of network inference, and 2) a means of verifying and improving the inference process. It will take quite some time before this verified sub-network is reconstructed, but to suggest the way this might work, we decided to write a couple of blog posts comparing the Protestant letter network to the equivalent portion of the global network. In this post we will provide an overview of what a comparison of these two networks can offer; in a second post we will undertake some more quantitative analysis.

image

(Figure 2 Large Version)

Figure 1 (created by Sebastian) is a visualization derived from 289 letters found in John Foxe’s papers and publications and sent between Protestants in England during the reign of Mary I. Figure 2, by comparison is a section of the inferred network derived from the Oxford Dictionary of National Biography (ODNB), which was generated by running the network inference process for the those actors in our Protestant network that have their own ODNB entries. Straightaway two key differences between the parameters of these 2 networks can be observed: the network in figure 1 is bounded by time (communications made 1553-1558), and its actors’ membership within the Protestant community, whereas network 2 covers the whole lives of those Protestants, and consists of all associations that have been inferred with a confidence that exceeds a given threshold. This is the nature of working with collections of primary documents: they are circumscribed by various collection policies. For this reason, network 1 excludes figures like John Story, who, despite being in written communication with various Protestant leaders in the reign of May I, was an opponent of the Protestant community as one of the key enforcers of the Marian religious reaction. Based on these two additional layers of circumscription acting on network 1, it should look much smaller. Yet it doesn’t, even when we take into account some of ‘false positives’ included in network 2, such as the historian and biographer John Strype (1643–1737), who only wrote about the Protestant martyrs, and William I. Why is this?

There are three main reasons: the brevity of ODNB entries, their bias, and the confidence levels used in the network inference process. Considering first the issues of brevity and bias: ODNB entries are concise introductions to the biographies of figures perceived to be significant in the nation’s history. This means that there is only space to mention their ‘important’ relationships and interactions. However, religious leaders did not only write to other high profile figures in the Protestant movement, and, as a result, there are a significant number of figures that appear in network 1 who do not have their own ODNB entries; some are only mentioned in passing within others’ entries, and others still are not mentioned at all. Significantly, many of the figures who fall into this category are women, for while women play a minor role in both historical and modern scholarly accounts of the Reformation, they often occupied vital infrastructural roles within this network, funneling letters, money, food and other goods to the London prisons to support their incarcerated co-religionists. These infrastructural figures are highlighted by two key network measures: “betweenness” (which measures the number of times a shortest path across the network goes through a given node) and “eigenvector centrality” (which is closely related to the algorithm used by Google to assign importance to web pages in the World Wide Web, and to rank its search results by relevance). Network 2 demonstrates how the need for confidence levels when inferring a network exacerbates the structural bias against women in the ODNB: because they are mentioned rarely, they often fall below the radar. However, the combination of primary sources and network analysis has the ability to provide a more gender-balanced view of the early modern social network.

While the relative absence of women from network 2 might have been expected, there are some other more surprising absences, which highlight the problem with using the ODNB alone for network inference. In network 1 John Careless is shown to be an important hub. This is because he sent a lot of correspondence, and so accordingly he is given significant coverage in John Foxe’s famous ‘Book of Martyrs’. Yet the ODNB does not have an entry on him. Bartholomew (or Bartlet) Green, a prominent martyr in Foxe’s book, is another significant absence from network 2. This is because although he does have his own ODNB entry his relationships are given low confidence levels because his name does not appear in anyone else’s entries.

The absence of nodes such as the female sustainers, Careless, and Green has a knock-on effect for our understanding of the topography of this network as a whole. It means that important relationships and channels of news transmission are not represented, which also prevents us from seeing how highly connected certain Protestants were. For example, in network 1 John Bradford appears right at the centre as the most connected node, with links to 67 different people through the numerous letters he wrote. By contrast in network 2 he does not appear to be especially connected, and he is nowhere near the middle, which shows the inability of the inference methods to pick up on his notable evangelism, and the important relationships with carriers like Augustine Bernher and William Punt, and sustainers like Margery Cooke and Joyce Hales, that made this possible.

A comparison of these 2 networks, then, provides a number of lessons for the SDFB team about the reliability of the source material and our current inference models. Happily the former issue is being addressed as we turn to a broader range of primary and secondary literature; and as more anomalies are spotted and understood we hope to find ways to improve the way we assign confidence levels. But there are also broader lessons for anyone working on networks. When you construct a network you are always necessarily making decisions about where your boundaries are. This comparison shows clearly that choosing to include more or less information can radically change your findings.

 

Probability, Validation, and Method in Six Degrees of Francis Bacon

Six Degrees of Francis Bacon (SDFB) is at an important juncture in the project as we move onto the second phase, which brings on board two new members: Ruth Ahnert (Queen Mary, University of London) and Sebastian Ahnert (University of Cambridge). Their ongoing collaboration - which explores the application of quantitative network analysis to the study of sixteenth-century letter collections - will bring both new data and skills to our reconstruction of the early modern social network.

Up to this point, the team has been mining a single source, the Oxford Dictionary of National Biography (ODNB), to produce a preliminary list of 6,000 actors and to infer, with confidence estimates, a map of the associations between them. Links are inferred through data-mining, by chunking articles and measuring the co-occurrence of names. Confidence estimates arise through re-sampling.  Out of 100 runs through the documents, how well does the presence of one name predict the presence of another? Using this data, we have already created one large data visualization mapping all these inferred associations (‘Global Graph, featured on the blog 14 July 2013), as well as smaller ego networks focused around single figures, such as Margaret Cavendish or James Harrington. However, these inferred networks are coarse grained, biased to one source, and lacking verification. The next phase requires us to broaden the scope of our sources to extend and verify these inferred relationships, and to explore the ways that we can analyse the network we have reconstructed. 

In the coming months we will expand the range of primary and secondary texts from which we derive relationships in order to reconstruct the network more fully and accurately. Hathi Trust Reseach Center and Google Books have supported our work by providing non-consumptive access to texts in their archives, and so we will be expanding our inferred network using secondary literature from these two vast repositories. On its own, however, this approach only produces a set of probable links with confidence estimates. Inference algorithms produce some false associations and omit other true ones. This is why, at the same time, Ruth and Sebastian will also be constructing a validated sub-network from the Letters and Papers, Foreign and Domestic, Henry VIII, and The Calendars of State Papers foreign and domestic, which they have been granted permission by British History Online to access via their site. This will allow us not only to confirm and supplement the inferred network; it also provides a sizeable set of verified associations against which to validate the inferred network so that we can examine the technical process and to tweak the methods applied to derive data.  

Comparing the two networks will also alert us to systematic biases in secondary literature – some of which have already become apparent from initial case studies, such as the relative absence of women (see ‘An Entry of One’s Own, or Why Are There So Few Women In the Early Modern Social Network?’ 8 March 2013). Another important bias has been pointed out in Shawn Moore’s guest post on Margaret Cavendish’s network. He writes:

But with few exceptions, the majority of nodes are people with whom Margaret had little to no direct contact. Samuel Pepys and Dorothy Osborne, with .65 and .80 estimates respectively, are not known to have interacted with Margaret at all, though their impressions of her and her work still dominate traditional representations of the early modern writer.

This is both a problem with the nature of secondary literature, which places emphasis on how Cavendish was received by early modern writers who may well not have known her personally, and on the automated nature of the data-mining, which picks up these names as people likely to have known Cavendish because of the frequency with which they appear in close proximity to her name. Conversely, however, Moore points out that in addition to these false positives, there are some known relationships that have not been picked up through inference. This is due to the need to set thresholds high enough to filter out statistical noise – names that appear together once only are more likely to give raise to a false positive. But as a result we necessarily miss the fine grain.

The primary sub-network that Ruth and Sebastian are working on, then, aims to fill in these gaps. But Shawn’s contribution here is also instructive. Our ultimate aim is to have a whole host of experts like Shawn overseeing additions and modifications to the network, which will be editable to invited collaborators through a wiki interface; other scholars will be able to make suggested modifications which will be checked and implemented by these curators. There is still a long way to go before this is a reality, but through collaboration, and a growing team and support network of advisors, these aims are closer to realisation.

Global Graph

image

Above is a thumbnail from a large network visualization produced for SDFB by the talented folks at KNALIJ.  Click here to view the whole image, which is large (~12mb) but can be zoomed and navigated using your web browser.  The image includes only the top nodes and edges in our inferred network.  For a rather unwieldy visualization of all 6,000 odd nodes and their edges, without labels, click here.

The proximity of the nodes is determined by their connection strength.  If multiple nodes are all connected with a high degree of confidence, they will cluster together.  So, for example, you can see members of the Elizabethan court clustered in the bottom left hand corner.  The graph takes the shape of a circle because it’s what’s called a force-directed graph, in which links or edges are treated as springs whose stiffness varies based on confidence estimates.  It’s as though the nodes and their connections had been compressed, then left to settle in place in accordance with Hooke’s law for springs and elasticity.  Node size is a function of the number of connections, which is why a figure like Charles II is significantly larger than, say, Samuel Palmer.  The color of the nodes is an indication of community.  Nodes are members of the same community when they share a set number of edges with other members of the community.  When a node is part of multiple communities, its color is determined by the community with which it shares the most edges.         

What is remarkable about the image, from our perspective, is how much meaningful information it displays given the relatively sparse dataset on which it is based.  All that we sent to KNALIJ was a matrix of nodes and edges with confidence intervals.  But from this minimal data their clustering and community inference algorithms have inferred a remarkable amount. 

For example, though our data includes no dates or other temporal information, the graph has an obvious, though not entirely consistent, chronological organization.  Starting with the Elizabethan court in the lower left hand corner, the graph proceeds counter clockwise through the reigns of James, Charles I, and Charles II.  Nodes at 12 noon are largely post-Restoration and/or 18th-century.  Nodes at 10 o’clock are part of James’s Scottish court.

At first we wondered why the center of the graph is basically empty.  Then we realized that to occupy the center, a node would need to share edges with communities stretching over 150 years.  The empty center is, in effect, a sign of the temporal scope of of our network.  Presumably a network stretched over a longer time period would have an even more pronounced doughnut hole.

It’s worth at this point acknowledging some of the embarrassing things that this image makes evident about the current state of our inferred network.  We still have some named entity recognition problems.  The “Society of Antiquaries” should not show up in our network.  There’s more work to do on date limitations, since figures appear from both much earlier (King John) and later (Lloyd George) than our proposed date range of 1550-1700.  As we’ve discussed in earlier posts, there are still de-duping problems, especially with regards to monarchs.  Some of these should be simple to iron out: King James and James I should not have separate nodes. 

But in other cases the duplication provides potentially significant information. Even though James VI of Scotland became James I of England, it is fascinating to see different communities and networks surrounding the two names.  Nor, to scholars of the period at least, is it self-evident that James VI of Scotland and James I of England should be treated as the same person.  As Jenny Wormold asked long ago, James VI and I: Two Kings or One?  

King James’s northern and southern subjects shared one attitude: both treated this man, who embarked on his dual role three months short of his thirty-seventh birthday, as their king, dividing him as far as possible into two separate individuals.

At stake in the question “Two Kings or One?” is the category of Britain itself.

In some cases the use of colors to indicate communities shows fascinating breakdowns in social coherence.  A light blue Elizabeth I is surrounded by a sea of relatively unbroken light blue Protestantism.  But the pink Charles I is cut off from his Laudian community and hemmed in by the darker blues of Cromwell, Fairfax, and Henry Vane.  Henrietta Maria appears to have her own small and dispersed community, set apart from the rest of the mid-century milieu, that is more closely connected to the courts of Charles II and James II, and doubtless to the court in exile starting in 1644.

As tempting as it is to turn these images into narrative, it would be unwise to draw any strong conclusions at this point.  We can’t be sure which aspects of the visualization are artifacts of our highly imperfect network data, or of the arbitrary thresholds with which the visualization algorithms organize that data into a coherent image.  Revised data, or different thresholds (particularly thresholds manipulable by users), could and doubtless will yield very different pictures.

That said, we see the filtered SDFB graph as a rather large map of  problems.  Why does Henrietta Maria have a community distinct from her husband and those most proximate to her? Why does Jacob Tonson sit so far out to the upper-right hand corner?  Why are certain nodes so evidently out of place?  When and why don’t communities align with proximity, color with clustering?  The problems call out for further explanation, interpretation, and speculation especially by experts with knowledge of the period, of particular figures, or of graph learning and/or visualization.    

Networks as Constructs: The Curious Case of Margaret Cavendish, Duchess of Newcastle (1623?-1673)

Guest post by Shawn W. Moore

The Six Degrees of Francis Bacon (SDFB) team is delighted to introduce here the first in an occasional series in which we invite early modernists with expert knowledge about particular groups and figures to reflect on ways SDFB might aid their own research and on how SDFB’s iterative process could be improved. 

Shawn Moore, a PhD student at Texas A&M University who as curator of The Digital Cavendish Project is developing exciting new methods to study early modern networks, examines our early attempts to reassemble Margaret Cavendish’s associations from the 58,000 DNB entries with which we began.   We sought Shawn’s feedback because of his demonstrated knowledge about both Cavendish in particular and digital methods more broadly. 

We told Shawn, “be completely honest and fully critical.  If there are wrong links do say so; if there are important missing connections, that’s just as important; if there’s something useful that we can’t yet do but that you think would be crucial, let’s hear it.  Again, it would be great also to hear how the graph or confidence intervals might be useful to you and your research, but you should by all means go beyond that as warranted…The point is that this is very much a collaborative, iterative process.”

Shawn Moore’s guest post is below. 

Any social network analysis of Margaret Cavendish, Duchess of Newcastle upon Tyne (1623?-1673) must deal with two important characteristics of the famed writer’s reputation: her infamous, self-proclaimed shyness, and the exaggerated yet continually perpetuated reports of her eccentric megalomania. Margaret’s own textual practices complicate this relationship even further.

In fact, it’s a curious case in regard to Cavendish’s perceived network that she at once downplays her own relationships with her early modern contemporaries, yet at the same time embeds her texts with complex sociable relationships and consciously distributes her texts to the intellectual hotspots of her time. These apparent contradictions and her infamous reputation still fascinate Cavendish scholars. It makes social network analysis particularly interesting because it requires thinking about networks in a different way.  In many ways, her networks are non-traditional in that they often exist outside of and beyond Cavendish herself.

image

Figure 1.

In the visualization produced by SDFB (Figure 1.), we see an evenly spread layout of nodes showing a rather impressive network including famous contemporaries such as Samuel Pepys, Robert Boyle, and Thomas Hobbes. The confidence interval visualization (Figure 2, after the jump.) indicates the reliability of estimated relationships, how likely it is that Cavendish “knew” that person (see Network Inference post on the problems regarding “knew”), based on the DNB data. In this case, with a confidence of .9 we can reliably estimate that Margaret knew William Cavendish, her husband the Duke of Newcastle. But with few exceptions, the majority of nodes are people with whom Margaret had little to no direct contact. Samuel Pepys and Dorothy Osborne, with .65 and .80 estimates respectively, are not known to have interacted with Margaret at all, though their impressions of her and her work still dominate traditional representations of the early modern writer.

Read More

Coming Soon: Shawn Moore on Margaret Cavendish


The Six Degrees of Francis Bacon (SDFB) team is delighted to introduce in the coming days the first in an occasional series in which we invite early modernists with expert knowledge about particular groups and figures to reflect on ways SDFB might aid their own research and on how SDFB’s iterative process could be improved. 

Shawn Moore, a PhD student at Texas A&M University who as curator of Digital Cavendish is developing exciting new methods to study early modern networks, will examine our early attempts to reassemble Margaret Cavendish’s associations from the 58,000 DNB entries with which we began.   We sought Shawn’s feedback because of his demonstrated knowledge about both Cavendish in particular and digital methods more broadly. 

We told Shawn, “be completely honest and fully critical.  If there are wrong links do say so; if there are important missing connections, that’s just as important; if there’s something useful that we can’t yet do but that you think would be crucial, let’s hear it.  Again, it would be great also to hear how the graph or confidence intervals might be useful to you and your research, but you should by all means go beyond that as warranted…The point is that this is very much a collaborative, iterative process.”

Look for Shawn Moore’s post, “Networks as Constructs: The Curious Case of Margaret Cavendish, Duchess of Newcastle (1623?-1673),” in the next few days.

Gender and Name Recognition

In a recent post we talked about why SDFB currently does a poor job of including women, how we can fix it, and how it might eventually do an even better job of including women than some of our current intellectual tools.  

There is, however, an additional reason why women are excluded that we didn’t mention in the last post: the mismatch between the asymmetric naming conventions surrounding marriage (especially as they appear in the ODNB) and the capabilities of Named Entity Recognition (NER) and de-duplication (“de-duping”) programs.  

The naming conventions will surprise no one.  Women in the 17th century regularly took their husbands’ surnames when they married.  Multiple marriages meant that a woman would have multiple surnames.  For example, one of the founders of the Society of Friends (or Quakers) called herself Margaret Askew, Margaret Fell, and Margaret Fox at different stages of her life.  Identified, as a result, by multiple names in the ODNB, Margaret had approximately three times more difficulty meeting the 5-mentions threshold that would, for practical reasons, become our initial cutoff for inclusion.  

Scholars who study societies where women conventionally take their husbands’ names, as well as those who live in such societies, have developed general rules, and indeed intuitions, about how women’s names change as a result of marriage.  These rules and intuitions are not foolproof - when misapplied they can lead to scholarly errors and even social embarrassment - but they do a good job of handling most cases.  Scholars of 17th-century England have no conceptual problem recognizing that the names Margaret Askew, Margaret Fell, and Margaret Fox all refer to the same person.  Scholars have conventional ways (some simple, some more complex) of designating this identity of reference.  Fell’s ODNB entry (authored by Bonnelyn Young Kunze), for example, begins, “Fell [née Askew], Margaret (1614–1702).”  The French word “née” is one such convention of obvious and longstanding use.  But other ways of acknowledging identity are tacit and of more recent vintage.  If one searches the ODNB for “Margaret Fox,” one is silently directed to the entry for Margaret Fell.  Fell is never referred to as “Margaret Fox” in the entry (though she is in one of the sources); rather the identity is encoded only in the site’s redirection to the entry.  

Though the conventional rules and intuitions surrounding name changes are familiar enough to those who use or study them, NER and de-duping programs have to learn them from scratch.  In some respects this is similar to other problems of name duplication. “Charles I,” “King Charles,” and “Charles Stewart” all refer to the same person.  Briefly, and amusingly, we also had a “Charles I. King” among our set.  To ensure that they don’t appear as multiple nodes in the network (“King Charles knew Charles I who also knew Charles I. King!”) we’ve simply had to tell the network estimation algorithm that they are the same person.   

But changes in women’s names as a result of marriage are different in a few key respects.  There’s little need to develop rules for de-duping figures like Kings and Queens (male name + roman numeral = “King” Name; female name + roman numeral = “Queen” name.)  Such examples are few enough that it just makes sense to do it on an individual basis.  But women who marry are obviously a much much larger class, such that developing general rules for de-duping would be essential to making sure SDFB adequately includes and represents them in the network.  It would be useful to develop de-duping procedures, for example, that recognize that what follows the term “née” is an alternative last name for the same person.  And it’s not simply a matter of de-duping either.  The NER program needs to recognize the different and often more elaborate formats of women’s names in the first place.  It needs, for example, to be able to read a string like “Fell [née Askew], Margaret (1614–1702)” and recognize this as a name in the first place.

The point, we suppose, is that the inclusion of women in a resource likes Six Degrees of Francis Bacon will depend on more than good will, scholarly self-critique, self-awareness, or even careful research.  While these virtues remain important, it will also require good programming as well, programming that takes into account both the gendered naming conventions of the period and the notations by which we record those conventions.

An Entry of One’s Own, or Why Are There So Few Women In the Early Modern Social Network?

in honor of International Women’s Day


In this post, we will address what has long seemed to us a conspicuous shortcoming in the Six Degrees of Francis Bacon (SDFB) data: the relatively small number of early modern women.  As Helen Smith cheekily put it on Twitter, there’s “more sausage than Bacon" in "Six Degrees of Francis Bacon."  Clearly, this is something necessitating further work, and it is worth emphasizing that we are currently in very early stages.  

How will women feature more prominently?  Our ultimate goal is to create architecture for scholars to curate, add, validate, and revise relationships.  Groups like the Society for the Study of Early Modern Women are well placed to help fill in what are currently obvious silences in the graph.  And, as we mine further data sources, including scholarship from the last half-century on women writers in the period, resources like the Brown Women Writers’ project will continue to offer rich information about networks of women writers.      

But the reasons why relatively few women appear in our earliest graphs are not self-evident, and those reasons open into intriguing questions about historiography, scale, and the kinds of relationships privileged by the DNB.  Our work with the DNB data certainly shows us much about early modern history and culture but it also yields insights into the way early modern history and culture get refracted through the particular, biased, and fallible lens - lenses? - of the DNB.     


The Oxford Dictionary of National Biography has roughly 58,000 biographical entries.  Once we had performed our Named Entity Recognition, our list of entities was already nine times as big, totaling in the end about 450,000 entities.  We tried to bracket non-persons (cities, organizations, and other such entities captured by our wide net) and because we were interested in particular in the early modern period, we further limited our set to people who lived between 1550-1700.  Limiting our set by these years was less straightforward than one might think. Some individuals appear in the body of the DNB who don’t have their own entries.  These individuals rarely appear with life dates.  We therefore had to develop further methods to infer approximate years of life.  

But even after we limited our data initially, our data set still remained too unwieldy for the kinds of validation and analysis we needed to do.  Our quantatatively-minded readers may not necessarily be scared off by such numbers, but humanists will surely appreciate the difficulty of trying to validate inferences from a data set with tens of thousands of names and, squaring that number, hundreds of millions of possible relationships.     

So, after dividing DNB entries into roughly 500-word chunks, or records, we introduced a threshold: we would limit our set further by working only with names that appeared five or more times in those 500-word records. Consider this: taking only persons prominent enough for their names to to appear in five or more DNB records, there are roughly 6,294 people who were alive between the years 1550-1700 who fit that criterion.  Each of the 6,294 people could therefore have been associated with any of 6,293 others.  This means that just at the highest levels of prominence—remember, we aren’t even counting people whose names appear in 4(!) DNB records—estimating the early modern social network involves inquiring into roughly 39 million possible relationships (assuming both one-way and two-way relationships).  

So why are there so few women in the early modern social network?  Early modern women have far fewer of their own DNB entries, and even when one counts their appearances in records derived from others’ entries, as we’ve tried to do, those appearances rarely total five or more.  

As it grows, we hope to use SDFB to rectify such biases.  While the social network inferred from the DNB currently does a poor job of including women and their associations, we believe that SDFB has the potential to enlarge our understanding of women in early modern England.  As impressive and important as the Women Writers’ Project is, for example, it is limited by the fact that it is dedicated only to women who were writers, and specifically those who published texts from 1526-1850.  


What about those women who never set pen to paper, but who played crucial roles in creating and convening sub-networks of artists, intellectuals, diplomats, and politicians nonetheless?  Early modern women assembled the society and culture of early modern England no less than men did, and by recording their associations SDFB is uniquely positioned to represent, and even to help us discover, the various ways in which they did so - including those that did not involve writing.     

"More sausage than Bacon," it turns out, is, among other things, an argument for developing more sophisticated approaches to the DNB, for mining sources more sensitive to women’s networks, for rectifying historical biases through more research on women, and for enlisting the expertise of individual humanists with detailed knowledge about early modern social networks. SDFB’s present universe of 6,294 names and their possible relations is a very good start, but it is clear it is just that: a start.

Network Inference, Visualization, and the Generative Difficulties of “Knew”: The Case of James Harrington (1611-1677)

While graphs like the one immediately below focusing on James Harrington (1611-1677) help make early modern social networks visible, they are based upon data like that in the included chart.  

 

image

In this case, we have used line thickness, or edge weight, to indicate how likely it was, according to our analysis of the DNB, that two people knew one another (more on the generative difficulties of “knew” later).  The thicker the line, the higher the “confidence estimate,” which is to say, the more chance that two people knew one another, at least as far as the collective enterprise of the DNB is concerned.  Another way to think about the confidence estimate is to see it as the answer to a specific question: “when the SDFB team runs its algorithm over a random selection from the full DNB-derived data set 100 times, how many times are these people connected?”

image


In the visualization above, we’ve introduced a threshold at 51% (or 51 times).  Had we used a lower threshold, we would have introduced more names to the visualization (and more visual clutter), but our confidence—or more precisely, the confidence we’ve derived from the DNB—in those connections would have been lower. 

James Harrington was of course the author of Oceana (1656) and several republican pamphlets, many issued around the Restoration when his “Rota” club was most active.  Here, Harrington appears in a network composed largely of’ “commonwealths-men” and late seventeenth-century controversialists, precisely as one familiar with the standard account of his milieu might expect.

Yet the visualization can also help us move beyond what we already know.  Graphs such as this one are intended to spur new inquiry.  When we think, for example, about the paths of books (given, purchased, lent and not lent), the social lives of manuscripts (shared, copied, annotated, altered, torn), relations of patronage or affection (given, or withheld), we find ourselves in a new world of hypotheses and scholarly conjectures. Intellectual affinities, linguistic patterns, and group boundaries all take on new dimensions.     

Visualization even prompts us to clarify what we mean when we say two people “knew” one another. Were they friends, enemies, lovers?  Political allies of convenience? Of conviction?  Can a reader “know” an author if he hasn’t met her in the flesh?  What if the medium is the letter as opposed to the printed book?  Inference and visualization operate in this way as a kind of “experimental metaphysics,” to use Bruno Latour’s term. Far from binding us to dry quantitative analysis, visualization and the confidence estimates on which they’re based enable us to move swiftly toward complicated questions of affect, intimacy, and ideology. If we want to say, for example, that exchange of letters constitutes a relationship but reading one another’s books does not, what are the ideological suppositions, modes of address, “radicals of presentation” (Frye), and textual effects underpinning such a claim?

Consider here the usefully challenging case of John Toland, whose relation with Harrington is, according to our model, 99% certain.  James Harrington died when Toland was seven years old.  Common sense suggests that something’s gone wrong.  However, we know from J.G.A. Pocock, Blair Worden, Justin Champion, and others that Toland is the figure most responsible for carrying Harrington’s republican torch into the 18th century through his influential biography of Harrington, published with his similarly significant edition of Oceana (1700).  Toland, we might say, “knew” Harrington as well as anyone in Harrington’s lifetime.  While the relation was not reciprocal (Harrington did not know Toland), there are strong grounds—empirical, theoretical—for including this relationship. Put somewhat differently, we would skew our results considerably if we tried to force Toland out of Harrington’s network by altering the algorithm.  If the DNB tells us they were related, do we need to say they were not?  And this raises questions about the historical significance of the textual trace (life writing, editing), about ideological proximity vs. spatio-temporal proximity, and about the kinds of relationships privileged by the DNB itself.

At the same time, our “confidence estimates” might just as easily be called “doubt estimates,” and these “doubt estimates” can have considerable scholarly value too.  Consider two possible uses.  While low scores suggest slim chance of a relation, they also show scholars (and students) where there’s relatively high scholarly payoff for demonstrating evidence of connections.   This is one of the reasons why we err on the side of inclusion, giving confidence estimates nearly down to nil.  In shorthand, it helps scholars know more about what we don’t know about.  This “metaknowledge” is a key step on the path to new discoveries and arguments.    

Secondly, low scores take us toward a category we’ve come to think of as “white space” in the social graph, specifically, the category of non-relation.  What we’ve already said about the possibilities generated by networks might make “white space” or “non-relation” seem relatively uninteresting.  Presence is much more fun than absence, right?  But here, it’s possible to develop more fine-grained understandings of groups, individuals, and their “publics,” where publics are understood, in Michael Warner’s sense, as self-organized relations among strangers.  Insofar as tropes, images, ideas and so forth develop and operate within networks, we can posit a rough and ready dichotomy of network, on the one side, and public on the other. While undoubtedly reductive, such a dichotomy can be a productive heuristic, generating more concrete thinking about the kinds of languages and practices deemed worthy of “export” and the people and groups who made up audiences for publication, understood in its broadest sense.