(This topic-modeling and analysis was assisted heavily by Professor David Birnbaum, who developed the SVG from the data that was collected by John and Katelyn.)
In the autumn 2012 semester, the Milton team
(Katelyn Antolik and John Eckhardt) divided Paradise lost into 118 segments
and used Mallet topic modeling to identify ten
topics that characterized those segments. Mallet reads the text and tries to identify
what it considers topics
by looking for word collocations. The topics it found in
this experiment were:
Some topics may make more sense than others. For example, topic #0 seems to be related to
Hell and topic #5 seemed to be related to Heaven. It’s harder to make sense of topic #3,
though, which combines sin
and sad
with peace
and grace
;
perhaps this topic might be understood as dealing with the intersection of life
and death
, both of which Mallet also considers characteristic words for the
topic.
In addition to identifying topics, Mallet also determines the extent (measured in percentages) to which each of the 118 segments of the poem is associated with a given topic. For example, for the first segment it decided that the topics were distributed as follows (ordered from most to least represented in that text):
Mallet seems to think, then, that the first fifty lines of the poem are associated most strongly with topic 0 (21.2%), then with topic 7 (17.4%), etc.
As a preliminary exploration, we graphed the values for topics 0 and 5, as follows:
Red is topic 0 (Hell) and blue is topic 5 (Heaven). The values on the X axis are the numbers of the slices of the poem (from 1 to 118), which lets us watch the increases and decreases as we move through the poem from start to finish.
The graph is reasonably accurate with respect to the plot of the poem. It shows that the topic related to Hell exceeds Heaven in the opening of the poem, which is exactly what a human reader would find: the first two of the ten books are written almost exclusively about Satan and the creation of Pandemonium. We then see that the topic related to Heaven briefly rises up to about the same level of frequency as Hell roughly between parts 25 and 45. In the third book, the focus is on God and the formation of his heavenly army, and God’s first speech appears in the third book as well.
We also see two large spikes, one where Hell increases directly in the middle and another towards the latter half of the poem. The middle of the plot concerns the fall in the garden of Eden. It is understandable, then, why the topic of Hell (topic 0) seems to become more prevalent at this point in the poem (sections 50–62). Satan has coerced Eve into eating the apple from the tree of knowledge. The prevalence of topic 5 (Heaven) beginning in section 65, with a peak at section 80) is representative of God’s punishment of Adam and Eve—their banishment from Eden.
Topic modeling is an exploratory tool. It is the human scholar who performs the literary analysis, as has always been the case, but topic modeling helps organize the data in at least two ways that are useful to the scholar:
helland
fireand
painshow up together (in topic #0), but that
stoodand
restcooccur with them is harder for a human reader to notice. Whether that cooccurrence is meaningful is an evaluation left to the human, but if the human doesn’t notice the collocation, there is no possibility of considering and evaluating whether it might be significant.