About the data

(This topic-modeling and analysis was assisted heavily by Professor David Birnbaum, who developed the SVG from the data that was collected by John and Katelyn.)

In the autumn 2012 semester, the Milton team (Katelyn Antolik and John Eckhardt) divided Paradise lost into 118 segments and used Mallet topic modeling to identify ten topics that characterized those segments. Mallet reads the text and tries to identify what it considers topics by looking for word collocations. The topics it found in this experiment were:

Some topics may make more sense than others. For example, topic #0 seems to be related to Hell and topic #5 seemed to be related to Heaven. It’s harder to make sense of topic #3, though, which combines sin and sad with peace and grace; perhaps this topic might be understood as dealing with the intersection of life and death, both of which Mallet also considers characteristic words for the topic.

In addition to identifying topics, Mallet also determines the extent (measured in percentages) to which each of the 118 segments of the poem is associated with a given topic. For example, for the first segment it decided that the topics were distributed as follows (ordered from most to least represented in that text):

Mallet seems to think, then, that the first fifty lines of the poem are associated most strongly with topic 0 (21.2%), then with topic 7 (17.4%), etc.

Using SVG to visualize the data

As a preliminary exploration, we graphed the values for topics 0 and 5, as follows:

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115

Red is topic 0 (Hell) and blue is topic 5 (Heaven). The values on the X axis are the numbers of the slices of the poem (from 1 to 118), which lets us watch the increases and decreases as we move through the poem from start to finish.

So what does this tell us?

The graph is reasonably accurate with respect to the plot of the poem. It shows that the topic related to Hell exceeds Heaven in the opening of the poem, which is exactly what a human reader would find: the first two of the ten books are written almost exclusively about Satan and the creation of Pandemonium. We then see that the topic related to Heaven briefly rises up to about the same level of frequency as Hell roughly between parts 25 and 45. In the third book, the focus is on God and the formation of his heavenly army, and God’s first speech appears in the third book as well.

We also see two large spikes, one where Hell increases directly in the middle and another towards the latter half of the poem. The middle of the plot concerns the fall in the garden of Eden. It is understandable, then, why the topic of Hell (topic 0) seems to become more prevalent at this point in the poem (sections 50–62). Satan has coerced Eve into eating the apple from the tree of knowledge. The prevalence of topic 5 (Heaven) beginning in section 65, with a peak at section 80) is representative of God’s punishment of Adam and Eve—their banishment from Eden.

A few thoughts about Milton and Mallet

Topic modeling is an exploratory tool. It is the human scholar who performs the literary analysis, as has always been the case, but topic modeling helps organize the data in at least two ways that are useful to the scholar: