2014 • UC Berkeley School of Information

With a classmate, I designed and prototyped DataFramer, an interactive tool for exploring and annotating overviews of high-dimensional datasets, inspired by the seminal principles of exploratory data analysis (EDA) and implemented with flexible web technologies. 


Since the term was coined by statistician John Tukey in the 1960s, exploratory data analysis has been discussed as an "approach" or an "attitude," but its methodology has been hard to pin down. Using commercial data analysis software such as Tableau, researchers are prone to diving down rabbit-holes when a data point piques their interest, leaving the rest of their dataset unexplored and potentially wasting hours of time on dead-end investigations. Our solution is "question-driven design," a visualization approach that gives analysts only as much information and functionality as they need to build a mental model of their dataset and formulate potential research questions. 

viz overview

DataFramer helps users resist the temptation of data-driven distraction by algorithmically generating a “first look” at a dataset with appropriate descriptive charts for each data type and allowing users to annotate individual variables and track possible research questions. The focused, minimalistic design of DataFramer enables rapid comprehension of the landscape of a dataset, and helps users produce an actionable analysis plan.

Our prototype of DataFramer was built with Meteor, AngularJS, and D3. The code is on Github, and you can try out a (functional but insecure) demo at