I’d like to propose a session on exploratory data analysis. While it’s also useful to consider “confirmatory” analysis and visualization when making an argument, I’m interested in the messier, earlier stages of research. The first thing we might do when getting a new text, or corpus, or dataset is to graph/visualize it, map it, cluster it, and so on. What then? We might do a close reading of some of the intriguing passages or concepts turned up by the initial phase. But how can we redo our analysis and visualization in light of this close reading? This isn’t a process that I think any field, in the humanities or social or natural sciences, does very well at teaching. Could digital humanities perhaps become a leading field in practicing — and teaching — this cycle of playing with and remolding data and models?

As an undergraduate studying classics, I fell in with a rough crowd at the Perseus Project. I not only learned about markup and morphological analysis but also about the practicalities of programming in a more real way than was taught in CS classes. After several years at Perseus as we worked on generalized tools for humanities digital libraries, I decided that a CS Ph.D. would be the best way forward for the research I wanted to do. I worked on natural language processing, computational linguistics, and machine translation with Jason Eisner and the Center for Language and Speech Processing at Johns Hopkins. After grad school, I was a research assistant professor in CS at UMass Amherst, where I've worked on applying NLP to information extraction and retrieval from large OCR'd book collections. This fall, I'm starting as an assistant professor at Northeastern where, with Ryan Cordell, Elizabeth Dillon, and David Lazer, I'll be helping to found Centers for Digital Humanities and Computational Social Science.

  Steven Lubar

    This would be a fascinating session. I’ve been exploring the new Paper Machines plugin for Zotero – it gives you some wonderful quick visualizations of data based on your collected documents, or data available from JSTOR. But what to do with it once you’ve got the visualization? I think it’s an area where traditional knowledge and digital skills might make for a very useful synergy. Like you say, something that’s not much taught.

  Emily Strong

    This is a great proposal. I’ve encountered this situation in my own research, and the process in transitioning from exploratory analysis to an appropriate theoretical framework and structure for the final analysis has involved a lot of blindly feeling my way towards that end result.

  Emily Kugler

    I’m really interested in this as well as the session Ryan Cordell proposed.

