Tools for DH–Humanities in Action

I am teaching a partial credit course for the Languages and Cultures and Humanities Residential Colleges this year, called Humanities in Action.  This is a project based course that meets once a week over supper to develop ideas for Humanities based projects and develop and design them in a group.

As I have been working with students and faculty on various projects in DH I have wanted to have one page where I put together sites/tools/tutorials that are helpful to us. I am compiling these here and this is still a work in progress, but it might be useful to colleagues out there also.  In DH we tend to learn collaboratively; so many of the tutorials are adapted from colleagues who have pioneered this approach to teaching, such as Miriam Posner at UCLA and Alan Liu at UCSB,

Tools for DH–Humanities in Action

There are some basic tools that can help you with your DH projects, whether you know programming or not.  Here are some of the ones my students and I have found most useful.  Tutorials are also linked.

Text analysis

jiayu's network
Network visualization of terms in Wikipedia’s RPG game descriptions (by Jiayu Huang for HUMN 270 Fall 2015)

Voyant is the best multi-tool text analysis platform for a start.  The version that is online is the earlier release and can be found as part of the suite of tools that can be found here.  There is a new version of Voyant that brings these different platforms into one interface and which doesn’t require switching between tools.  If you want to use that, ask me.  I have a version on my thumb drive.

Voyant is very good as a concordance and frequency analysis visualization tool.  It can work with large amounts of text in multiple files.  You can compare aspects of different texts easily.  For example, which words come up most frequently in which texts; which terms are collocated; what are the vocabulary densities of different texts?

Here is a tutorial for Voyant 2.0

There are also sites/tools for analyzing large amounts of text data from a macro or high level perspective:  for example, Google Ngram viewer which visualizes word frequencies in the corpus of Google digitized books (in multiple languages)  and Bookworm which visualizes trends in repositories of digitized texts.

Topic Modeling

Screenshot 2015-09-27 19.26.38Topic modelling is a method by which your text is chunked into pieces and a computer works out what the most important topics are in the chunks.  The algorithm is not interested in meaning, just in related concepts.  The best tool for this is MALLET; a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. MALLET includes sophisticated tools for document classification: efficient routines for converting text to “features”, a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics.

But if you are not comfortable with command line programming there is also an online version that can work for smaller amounts of text.  That can be found here.

There is also a nice demo tool that can be used to identify topics, themes, sentiment, concepts at AlchemyAPI

Screenshot 2015-09-27 19.29.27Miriam Posner has written a great blog about how to interpret the results from Topic Modelling outputs.

Mapping

There are lots of online platforms out there for mapping data.  It all depends how fancy you want to get and whether you want to do more than map points.

CartoDB is definitely fast and flexible.  If you have a csv with geo-coordinates you can upload in seconds and have a map.  It also has a geo-coder that can quickly turn your list of places into latlongs.Screenshot 2015-09-27 19.49.35

Another way to go is through Google Fusion tables.  Again this is a super fast and easy way to map data.  You can also produce a nice “card view” of entries that will make your Excel spreadsheets into much more reader friendly format.  There are also other multiple ways to visualize your data in graphs, networks, and pie charts.

ArcMap online is another way to go if you want to produce far more sophisticated mapping visualizations, such as Story Maps and Presentations.  Bucknell has an institutional account.  If you want to use it, let me know.

Palladio is an interesting multi-dimensional tool from Stanford Literary Labs.  It can produce maps, networks, timelines and graphs of your data.  Here is a tutorial for my HUMN 270 class, written by Miriam Posner.

Building 3D Models

This is a part of DH I have not yet ventured into, but others on campus definitely have!  The easiest entry into modelling is SketchUp.  We also have Rhino loaded on the machines in Coleman 220 and it is regularly used in Joe Meiser’s Digital Sculpture class.

Timelines

Screenshot 2015-09-27 20.30.02There are various platforms out there for constructing digital timelines that also allow for the inclusion of multimedia elements and one, Timemapper, also allows for a mapping window.  Most frequently used are Timeglider and Timeline.js

Creating a Digital Exhibition

If you re interested in curating a digital exhibition of artifacts, the best platform to use is Omeka.net. This is a free online version of the more robust and versatile Omeka.org platform which has to be installed on Bucknell’s servers (which can take a while).  Omeka.net allows you to upload digitized images, documents, maps etc to a “collection” that can then be arranged and curated as an online exhibit.  This is particularly useful if you have found a collection of photographs (maybe your own) that you would like to present in a public facing platform with a narrative logic.

Screenshot 2015-09-29 09.00.21
View of Matthis Hehl’s Itinerant Map of Pennsylvania, annotated using Neatline. http://ssv.omeka.bucknell.edu/omeka/neatline/fullscreen/itinerant-preachers-map-of-pennsylvania

Again, if you want to do this I am happy to show you how.  Here is a link to my own (developing) Omeka site at Bucknell on the Stories of the Susquehanna.  The server-based version has a very nice visualization tool called Neatline, which allows you to link the digital artifacts in your collection to a base image (maybe a map or a painting) and then annotate. This is an example I am developing for the 1750s Itinerant Preachers’ Map of Pennsylvania which I have used a great deal in my research and also in my teaching.  There is also a Timeline widget you can activate.  I have discovered a great tutorial on how to use Omeka and Neatline here, put together from a workshop given at the Michelle Smith Collaboratory at the University of Maryland.

Networks

If you want to create a network visualization in more detail and depth, use Gephi. Gephi is an open source, free, interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.  It runs on Windows, Linux and Mac OS X.  One of most important issues to consider before investing a lot of time in learning Gephi is whether or not your research question might be answered through using this visualization platform.  Ask yourself these questions, and then if the answer is yes, prepare your data!  Gephi has an excellent set of tutorials on GitHub.  Data can be prepared directly in Gephi or can be imported in csv format.

Pedagogical Hermeneutics and Teaching DH in a Liberal Arts Context

Diane Jakacki and I gave the following presentation yesterday at DH2015 in Sydney, Australia. We include the slides and the abbreviated form of the talk.  The complete version will be published as an article in the near future.

Thanks to everyone for coming and for your interest!

We take our title from Alan Liu’s challenge to DH educators to develop a distinctive  pedagogical hermeneutic of “practice, discovery, and community” What does this look like?  How do we put this into practice?

This paper focuses on our teaching experience at Bucknell University in the academic year 2014-15 to show how the planning, design, and execution of a new project-based course, Humanities 100, introduced undergraduate students to the world of digital humanities through the use of selected digital tools and methods of analysis. This course, taught within the Comparative Humanities program, was designed specifically for first- and second-year students with no background in digital humanities, in order to encourage the development of digital habits of mind at the earliest phases of their liberal arts curricular experience. Developed to encourage examination and experimentation with a range of digital humanities approaches, the course asks students to work with primary archival materials as core texts to encourage digital modes of inquiry and analysis. The decision to root the course in a multi-faceted analysis of archival materials provided the rare chance for students to also engage in the research process typical for a humanities scholar: namely, the discovery of artifacts, the formulation of research questions, followed by the analysis and synthesis of findings culminating in the publication of initial findings in a digital medium. In the process, we introduced students to the basic structure of how to develop a DH research project.

The Comparative Humanities program is an ideal curricular environment to teach such classes with its explicit learning goals of comparativity (historical period, cultures, genres, modality) to which we added course specific learning goals that pertain to DH. (Slide with goals) The course therefore provided us with the opportunity to not only expose students to methodologies related to distant and close reading, network and spatial visualization, but also requiring that they learn to think critically about what each of these methods, and the tools that they used within the course, reveals in the texts with which they worked.

To date the course has been taught three times: as twin sections in Fall 2014 in which we both used the same scaffolding method with discrete subject matter and core texts. We participated fully in one another’s sections – this gave us the opportunity to teach our specializations within each other’s classes. Katie Faull taught the course again in Spring 2015, and Diane Jakacki participated. Both of us will teach a section next year.

This approach to teaching is important as we consider how to incorporate DH into the classroom. It required significant commitment on both our parts to the actual execution of the course, as well as recognition that we needed to be transparent to ourselves as well as to our students about how this represented a new model for course design at Bucknell. It is important to note that while other DH-inflected courses are being taught, this is the first Digital Humanities course at Bucknell.

At  Bucknell, the focus in digital humanities scholarship and learning to date has been primarily on spatial thinking, until recently rooted in working with ArcMap-type GIS and thinking about humanities in “place”.  It was important to both of us to emphasize and extend that objective in the development of the course and its learning outcomes, and so we focused on finding materials that would be of interest to students so that they could relate to the historical context more directly.

The first time the course was  taught we decided to run it in two sections, anticipating an opportunity to reflect different perspectives of our expertise with DH methods and tools. Diane’s focus has until now been on text encoding and analysis, while Katie’s has been on mapping and data visualization. We also worked with discrete data sets of archival materials. Katie’s course focused on the Colonial mission diaries of the Moravians from Shamokin, Pennsylvania (today Sunbury) and situated 9 miles downstream from the university. Written in English, the diary sections selected dealt with interactions between some of the first Europeans to the area and the Native peoples they met and worked among. Katie has spent the past five years working with this subject matter, and is considered an expert in the field of Moravian studies.

Diane’s course considered a subset of the diaries of James Merrill Linn, one of the first graduates of the university and a soldier in the American Civil War.  The choice of the Linn material had to do solely with its accessibility – Linn’s family left his life papers to the Bucknell Archives. Diane’s research is not in 19th century American history, and so she had to be honest that engaging with Linn’s diaries would be a discovery for her, too. In Katie’s iteration of the course this Spring, she selected materials that took the students slightly further afield, but still kept them within the Susquehanna watershed and the Chesapeake Bay using a different set of Moravian archival materials.  (Slide with archival materials)

Both of our choices reflect and extend Bucknell’s interest in digital/spatial thinking in terms of its place in the larger historical and cultural narrative. In all cases, students responded well to the investigation of places familiar to them, with several students having family connections to specific locales mentioned in the archival materials. The pedagogical hermeneutics of Humanities 100 were intentionally designed to encourage student examination and experimentation  and discovery with a range of digital humanities approaches.  To this end, the sequencing of the modules was carefully designed so that the “product” of each module then became the “data” of the next module.

In addition to praxis-oriented assignments, we wanted students to understand the broader context of their work within a DH framework. To that end we assigned theoretical readings and analysis of a range of major DH projects, which students then wove into their online reflections. Extensive use was made of online platforms that emphasize important forms of digital engagement, including collaborative online writing environments. Each module ended with a short assignment and also a reflective public-facing blog post that became a shared form of intellectual engagement.

In order to begin any kind of DH archival project the students had to produce a digital text.  In the first iteration of the course we did not have a transcription desk available and so students transcribed the assigned pages of the original into a shared Google doc.  This digital text was then color-coded in terms of “proto” tags to ease the way into close reading with TEI tags in Oxygen.  By the time the second semester started we had obtained an institutional subscription to the online platform Juxta Editions which we were then able to use as the transcription platform and also the introduction to thinking about tagging. From the transcription came the lightly marked up digital text that was then imported into Oxygen for more complex tagging.  Students then began tagging in earnest and were introduced to the discoveries of close reading involved in marking up a text.  Names, places, and dates were easy (in Juxta edition they had already been imported).  However the hermeneutical fun started with working out whether a boat was a place or an object, for example.  Or whether God was a person.  And just what is balsam, an object?  an emotion?

During these classes, the historical remoteness of the texts (in Faull’s class from the first half of the 18th century, focusing on Native Americans in the fall and in the Spring on preaching to the enslaved peoples on the Tobacco Coast) was lessened by the act of tagging and the lively discussions that surrounded it. Once a reliable text had been established we then introduced students to the concept of “distant reading” through the Voyant platform.  At the same time as students were encouraged to “play” we also pointed out the circular motion of discovery and confirmation that is inherent in any research experience. The students had just read these archival texts very carefully in order to transcribe them, so we asked them the usual kinds of questions one asks when approaching any kind of new text.  What is it about?  What are the major themes?  Who are the most important characters?  Then, having read Edward Whitley’s text on distant reading we asked the students to think about what reading a text distantly does to that hermeneutic. (Slide of distant reading prompt and visualizations)

This data, the TEI tags, crucial to the success of the students’ mark up assignment and the production of a final digital document, needed some restructuring as we moved onto the next module.  To manage this, we developed a prosopography for each core text – a database of people, places, and connections that grew organically out of the focus of each specific section and provided the data for entry into Gephi and was then built out in adding geospatial data for GIS. So for example, one group of students wanted to use Gephi to interrogate the assumption that relationships between the missionaries and the Native Americans in the area around the mission remained constant.  However, by using the TEI persName tags and exporting them into a Gephi node/edge tables the students were able to show how relations between the Native leaders and the Moravian missionaries changed over a five year period of the mission (Include slide of Jerry and Henna’s work). Students also used the sigma.js plug in so that the network visualizations were interactive.  However successful this team was in their work, it was clear from all iterations of the class that the hermeneutics of social networks was the hardest for the students to analyse and manipulate (which is quite ironic, given how most of them are well plugged in to Twitter, Instagram, etc).

Lastly, students worked in ArcGIS Online to consider the evidence they had discovered within these texts in terms of spatial analysis. The story maps they produced became a new form of critical essay, with thesis, arguments supported by direct evidence, and conclusion all presented within a story map framework. so, for example, one student used Linn’s references to ships running aground during a storm at Hatteras Inlet, found a contemporary document reporting on the damage done to Union ships during this point in the campaign, and overlaid his evidence on a nautical map drawn in 1861 to determine where Linn’s ship had foundered.

Both the composition of the class (in terms of student personalities) and also the nature of the material determined to some extent the kind of final project students chose.  For example, in my section there were some natural groupings of students and there were a variety of final projects (one involving Gephi; two TEI markup; one hybrid ArcMap and TEI; and one story map). In Diane’s class all but two students chose to work independently  In the second iteration of Katie’s course, students decided that they would produce one final group project all together –a  course website that highlighted the best of their DH work. (Slide of Payne Froehlich website)

Assessment slide–self-explanatory

Another challenge to the class design was the high number of L2 students who enrolled in it.  In Katie’s Fall 2014 section there were 2 students of 9 from mainland China; in her spring section that ratio increased to three of five.  In the fall there was one from Australia and one from Vietnam (neither L2s but international students); one student in the spring course was from South Africa – her first language was Afrikaans.  Although the students admitted to being challenged by the readings and also the public facing writing in the blog site, a means for adjusting for student errors and allowing for corrections was developed that would allow the students to post their blog reflections in a way that did not impede their openness to reflection, knowing that they would have an opportunity to correct their English.

However, for all the challenges involved in teaching the class, there were moments of glory. Disengaged students became engaged; solitary learners recognized the essential need to collaborate in order to succeed; participants recognized the transformative nature of the course to their own concepts of the humanities. Students were eager to participate in crowdsourced data collection; they were intrigued to visualize ego-networks as they learned the concepts of network theory; they were excited to see their marked up transcriptions published in an online digital edition. Through these discoveries, they realized that they were creating a community of young DHers and expressed eagerness to take part in more of these learning experiences. Thank you!

Student Final Project for HUMN 100-The Humanities Now! Spring 2015

This spring I taught another iteration of HUMN 100 to a small group of highly motivated and talented students.  Like last semester, (see HUMN 100) this is a project-based class where students take an as yet unpublished manuscript from the Moravian Archives in Bethlehem, PA and develop their DH skills.

Screenshot 2015-05-05 21.26.02
Click image for link to website

This semester we were fortunate enough to work on the Travel Journal of Christian Froehlich and Jasper Payne.  Students started with the transcription of the manuscript and once a text had been established they were then able to analyze it using the lenses of the digital humanities.  The course website can be found here, where the outline, assignments, and blog posts are organized by topic (Close Reading, Distant Reading, Visualization, and Time).

Teaching with Emerging Technology: the Centrality of the Collaborative Mode

Screenshot 2015-01-30 13.22.22Over the last 6 months I have been working with the latest instructional technologies and digital tools in my class, Humanities 100.  This course, brand new for the 2014-15 academic year is designed to teach students how to create a digital project with archival materials.  The goal of the course is to teach students the importance of the creation of a digital text; to think about the design of data that stems from that digital text; to make intelligent decisions about the presentation of that digital text on the web; to teach students how to mark up a text in TEI lite and beyond; to begin to think about how to add geo-spatial elements to the analysis; and also how that text can be mined to build up a database of people and places (at the least)  that can then be used to create a network analysis of the text. That is a lot to learn; and from my experience last semester I can say that some students wanted to stop at, say, transcription of the text, or mark-up.  Continue reading “Teaching with Emerging Technology: the Centrality of the Collaborative Mode”

Using Spiderscribe for Humanities 150

This evening, in the HUMN 150 class, Art Nature Knowledge, we will use our first mind map, created using Spiderscribe. I hope it works.  I will try to add notes as the “scribe” during the class, so that tomorrow we will be able to use this mind map for discussions of the texts.

Here’s the map for the course Introduction!

http://www.spiderscribe.net/app/?970b8945a280d230b7bdee6e6d8cd359

I will report back on how this goes!  Wish me, John Hunter, and Nick Kupensky luck!!