You are here

Harvard Forest >

Harvard Forest Symposium Abstract 2013

  • Title: The Analytic Web
  • Primary Author: Emery Boose (Harvard Forest)
  • Additional Authors: Lori Clarke (UMASS Amherst); Aaron Ellison (Harvard University); Barbara Lerner (Mount Holyoke College); Lee Osterweil (University of Massachusetts - Amherst )
  • Abstract:

    This long-term project brings together computer scientists and ecologists to investigate a critical problem in science: how to ensure that scientific data analyses are reproducible. The solution appears to lie in the use of “provenance metadata” to document rigorously how data are transformed in each step of an analysis from start to finish. In our current work, this provenance metadata takes the form of two mathematical graphs: a process definition graph (PDG) that specifies the various ways in which a process might unfold; and a data derivation graph (DDG) that describes exactly how a process did unfold in a particular execution.



    These abstract concepts from computer science are tested through application to an ongoing project in a domain science: the analysis of streaming data from meteorological and hydrological sensors at Harvard Forest. Recent efforts have focused on (1) developing methods for quality control of streaming data, using R and Little-JIL (a high-level graphical processing language); (2) developing methods for storing and querying DDGs using an RDF (resource description framework) database; and (3) developing graphical tools in Java for visual exploration of a DDG.



    Work for 2013 will focus on creating user-friendly tools to allow scientists to ask questions about how their data (or data from other scientists) have been processed: in effect, to not only analyze data but to analyze the analysis of data. Such tools have the potential to facilitate troubleshooting and greatly improve quality control, especially for large and complex datasets.

  • Research Category: Ecological Informatics and Modelling
    Watershed Ecology