Harvard Forest Symposium Abstract 2011

Title: The Analytic Web
Primary Author: Emery Boose (Harvard Forest)
Additional Authors: Lori Clarke (UMASS Amherst); Aaron Ellison (Independent); Barbara Lerner (Mount Holyoke College); Lee Osterweil (University of Massachusetts - Amherst )
Abstract:
This long-term project brings together computer scientists and ecologists to investigate a critical problem in science: how to ensure that scientific data analyses are reproducible. The solution appears to lie in the use of “provenance metadata” to document rigorously how data are transformed in each step of an analysis from start to finish. In our current formulation, this provenance metadata takes the form of two mathematical graphs: a process definition graph (PDG) that specifies the various ways in which a process might unfold; and a data derivation graph (DDG) that describes exactly how a process did unfold in a particular execution.

These abstract concepts from computer science are tested through application to an ongoing project in a domain science, currently the analysis of streaming data from a hydrological sensor network. Recent efforts have focused on defining and executing this analysis using Little-JIL (a high-level graphical process language) and creation of a DDG in memory as the process executes. Work for 2011 will focus on creating a persistent form of the DDG (using database technologies) as well as methods for querying and analyzing DDGs.
Research Category: Ecological Informatics and Modelling