Volumes 1 and 2 from the American State Papers were the primary source documets I used to run our topic modeling experiment. There were 4 different types of Topic Models I compared; 15, 20, 50, and 100 topics. For at least a few of the lines in each document I was able to make effective topics but as the topic model sizes got smaller, I found myself better identifying topics.
The most popular topics came from the 15 topic model and the labels were Land Grants, Procedures, Community, Political Landscape, and the exchanging/passing down of land. It makes sense we are seeing these types of topics come up for our research in the American State Papers, but for the majority of bigger topic models I was not getting good enough labels as the word pool was too big.
For the larger topic models that did not develop a coherent model, it didn’t work because though I had a lot of material to work with regarding the two volumes, our pool of acceptable topics was too big, thus our results seemed too general. It seemed the words were less frequent in the docs as common words like land or united came up less with these larger topics.
I find the two topics of Community and Procedure interesting because from our feedback from project one it was expressed that we lacked a solid hook, and could have better reported findings from our literary sources with citations and analysis. This led us to acquiring more primary sources to find a story to hook our audiences, as well as getting solid bookmarks for citations. This will allow us to write a more compelling and attractive narrative. I believe that Mallet further reinforces these changes and legitimizes our prior research as our labels matched the research and vision we have been uncovering.