2015 • UC Berkeley School of Information

I joined the Wordseer project, led by Berkeley professor Marti Hearst and developed by recent PhD Aditi Muralidharan, in summer 2014. Wordseer is an open-source, web-based tool that leverages the techniques of natural language processing and data visualization for deep textual analysis in literary scholarship. It's intended to fill a particular analytical niche, what Marti calls the "middle distance" — broader than a close reading of a single text, but narrower than statistical techniques such as stylometrics. (We are currently redeveloping the previous version as a self-contained Python app that can run without a web server; release is expected sometime in 2015. The interface is built with ExtJS and D3, along with various plugin libraries.)

sentence list image

Because the primary use case of Wordseer is a researcher reading scores of sentences plucked from various documents in a corpus to find connections and develop theories, the lack of readability in the main workspace was a problem. I redesigned the tabular format of the "Sentence List" view to promote the sentence as the primary unit of analysis, increasing its readability without sacrificing any of the sorting or filtering functionality afforded by the tabular view.

related words image

One of Wordseer's most powerful features — the ability to leap from your current query to a variety of words that co-occur frequently in the same context — has also been one of its most vexing from a design standpoint. It's a challenge to fit all this information on the screen while still allowing the user to refer back to the original context easily, and the display implementation has changed forms several times. I reimplemented a very wide, multi-table overlay as a collapsible accordion menu, freeing up valuable screen real estate and bringing its size more in line with other menus used in the interface.

metadata image

Because it ingests documents from structured XML files, Wordseer collects a wealth of metadata along with the text – everything from structural information about the documents (e.g., chapter numbers, line numbers in poetry, speakers and stage directions in plays) to publication data (e.g., publication dates, authors, editions) and any other arbitrary information attached by compilers or researchers (e.g., tags, categorization schemes). I redesigned and reimplemented the original aggregate view of this metadata to make its data visualizations more uniform, predictable, and readable and to hew better to the core principles of effective chart design for exploratory analysis.

matrix image

Work on Wordseer is ongoing – current designs I'm developing include a matrix visualization to compare co-occurring words across metadata dimensions and a "grammatical explorer" to visualize grammatical dependency relationships for an active query, providing sample sentences for context and action prompts such as searching and filtering.

gramm explorer image