WWI – War Letters

Text Analysis

Within the collection of documents, the majority of the texts are from a series titled War Letters. These letters were written by current students and alums performing military service abroad and domestically. In these letters, servicemen share their feelings of loneliness and homesickness. They express both their frustrations concerning the often grueling conditions and their delight in experiencing small joys, such as favorable weather or a portion of a favorite food.

Using Voyant, a web-based text analysis tool, the War Letters were ingested as one document and the model was run. The purpose of running text analysis on these letters (and any collection of documents in general) is to reveal insights within the text that are not immediately apparent to the naked eye. In essence, text analysis employs an algorithm to supplement what is derived from close reading.

After running Voyant on the letters, a few basic stats were obtained. These letters have about 20.5 average words per sentence with a vocabulary density of 19.5%. The most frequent words referenced are: men (47); french (40); little (39); time (35); night (30) . These five words along with other frequently used words are displayed below in the word cloud. Naturally, one may ask what is the significance of these stats?

The word cloud on the right displays the most frequently referenced words in the War Letters collection. The top five most frequently referenced words are the largest while the sizes of other words represent their frequency. It is clear that national powers – french, german, and american are highly referenced. Additionally, many of the words are temporal in nature – time, night, day, and hour. This observation is in line with the experiences that the men share in their writings. Military life is extremely regimented so it is natural that temporal words are highly referenced. Without preparation could someone with perhaps a student’s background guess the context of this word cloud? – I believe that it is certainly possible!

Additionally, the nature of the letters as “back and forth” correspondence can be represented geographically. This visualization by Voyant is created by identifying geographic locations as they appear chronologically throughout the text. The pathways represent the subsequent reference of one location after another. For example, in the April 1918 edition of The Amherst Monthly, Ralph E. Bailey, ’20 who is stationed in Berne, Switzerland references his plans to send a report to Washington, D.C. detailing the steps that his regiment have taken to assist American prisoners of war.

Topic Modeling

Topic modeling is a subtype of text analysis that uses an algorithm to detect themes within a document. After applying Cornell’s Topic Modeling tool on the War Letters corpus, three distinct topics were detected after fifty iterations. (This particular topic modeling tool allows the user to select the number of topics to be detected, up to twenty-five topics. Sometimes, selecting a larger amount of topics results in vaguer, more indistinguishable topics. By selecting the minimum amount of topics, three, there is a greater possibility that each topic is distinct and interpretable). 

The below chart details each topic. The seemingly nonsensical words in the topics column represent the most frequently referenced words of that topic. The sentences that most frequently contain those words and thus, are the best representatives of the topic are included in the relevant lines column. The lines accompanying each topic make up over a page of text so the three lines that best convey the topic’s theme were included in the chart.  Ideally, the sentences should reveal the overarching themes of the topic. My interpretations of the themes are included in the summary column.

All topics, but the second, were easily identified. The first topic, which often contains words with an overall positive association is associated with happiness and joviality. The third topic, in stark contrast to the first, often contains words with an overall negative association – violence and acts of war are the subjects of the sentences that compose this topic.