Categories
San Juan Island History

Data Mining: A Computer Science for the Humanities?

By Jeremy Gerhardt

Data mining gives historians the opportunity to look at vast troves of data and create analyses and narratives that previously would have taken decades to surmise. Erez Lieberman Aiden and Jean-Baptiste Michel demonstrate this with their work on Google NGram viewer. In their Ted Talk, they examined how words changed over time. For instance, they made an exact diagram of the word “thrived” overtaking the word “throve” as the popular past tense of thrive. These methods are brilliant for analyzing troves of data, but not too far along, we see issues. For instance, they mentioned how the word “beft” was once used as a spelling for best. This seems valid until you realize that the s in older text once looked more like an f. Data mining is an exciting new tool for historians, but we need to understand that it doesn’t create its own historical meanings or interpretations. Just like it always was, that task is up to us. 

I’m curious how Google NGram view reacts to Sütterlinschrift

Photo Credit: https://www.pnp.de/lokales/landkreis-traunstein/Workshop-zur-Suetterlinschrift-im-Eichenhof-3226091.html

Thankfully, working right alongside their future machine overlords, you’ll find a group of diligent, socially aware, and most of all: human historians. Through our work in data mining, we are able to give voices to people silenced throughout history. For instance, Ruby Mendenhall, an associate professor of sociology at University of Illinois made a data mining project which mapped out the stories of black women suffragists in America. I remember in one of my history classes, the professor said “sometimes an absence of information tells you more about something than an abundance of information.” That’s what I thought about with this project. 

Data mining gives us a chance to analyze all sorts of less known people in society including convicts. Though in some cases, we find that despite their obscurity, they had a lot more in common with us than we may have thought. Professor Robert Shoemaker and Dr Zoe Alker examined the tattoos of convicts and instead of finding that they represent an explicitly criminal identification system, are better used to depict social ideals, trends, and attitudes among common people of the time. 

This chart shows the changing subjects of convict tattoos over time.

Photo Credit:https://www.digitalpanopticon.org/Convict_tattoos

Visualization by Sharon Howard

Data mining’s applications can vary from comparing language to mapping out changes ideals or attitudes during a period of time. These sorts of applications are in familiar territory for me as a history. However, higher tech applications can create downright jaw dropping visualizations. For instance the Six Degrees of Francis Bacon project creates this vast web of Francis Bacon’s network during his time. This sort of metadata project takes what once would have required thousands of books, and allows us to view a person’s entire network on a single screen. 

Yet no matter how advanced the technology behind data mining becomes, the discipline would be a total vacuum without an interpretation from a historian. Even when looking at the Six Degrees of Bacon Project, I’d have to have a starting point as to what I wanted to interpret or research in regards to Francis Bacon. Otherwise, it’s just an intricate web. Though this seems obvious, it’s something other professions easily forget when talking about digital outlets in history. 

Carl Minksy writes about the trend towards allowing history to be made from algorithms and numbers instead of interpretations. He cites an article claiming that “Machine learning algorithms can overpredict historical significance for some documents and overlook others that will prove to be important, he warns, which he demonstrated in a project with Microsoft called “Predicting History”. He also cautions that “Poorly-made data analyses can unwittingly lend an air of objectivity to historical arguments that really can’t be supported by these incomplete archives.” I think that the biggest lesson we need to understand with Data Mining, and really any non-human interface used in history, is that it can’t interpret history for us and more than we could be expected to create the complex visualizations and charts that computers can create. Data mining needs a soul, so to speak. With a soul, we can use this technology to give voices to those who previously had none. We can highlight similarities with our ancestors, and we (perhaps most importantly to students) save hundreds of hours on projects. However, without a soul, Data mining is just a web of numbers. 

2 replies on “Data Mining: A Computer Science for the Humanities?”

“He also cautions that ‘Poorly-made data analyses can unwittingly lend an air of objectivity to historical arguments that really can’t be supported by these incomplete archives.'”

I worry about this a lot. Our field has a long history of using its authoritative position to put forth biased (and harmful, and inaccurate) arguments as objective truth. How can we use these powerful resources to correct that course rather than exacerbate the problem? Are historians being properly trained to navigate and interpret big troves of historical data?

“Thankfully, working right alongside their future machine overlords, you’ll find a group of diligent, socially aware, and most of all: human historians. Through our work in data mining, we are able to give voices to people silenced throughout history.” I really liked this introduction in one of your top paragraphs. I feel like it is so true, as well. It seems like historians are really in a adapt or succumb to technology phase, with our generation. You have such a great analysis and causal take to your post, Jeremy. I really enjoyed it!

Leave a Reply

Your email address will not be published. Required fields are marked *