You are here

Harvard Forest >

Harvard Forest Symposium Abstract 2012

  • Title: The Analytic Web
  • Primary Author: Emery Boose (Harvard Forest)
  • Additional Authors: Lori Clarke (UMASS Amherst); Aaron Ellison (Harvard University); Barbara Lerner (Mount Holyoke College); Lee Osterweil (University of Massachusetts - Amherst )
  • Abstract:

    This long-term project brings together computer scientists and ecologists to investigate a critical problem in science: how to ensure that scientific data analyses are reproducible. The solution appears to lie in the use of “provenance metadata” to document rigorously how data are transformed in each step of an analysis from start to finish. In our current work, this provenance metadata takes the form of two mathematical graphs: a process definition graph (PDG) that specifies the various ways in which a process might unfold; and a data derivation graph (DDG) that describes exactly how a process did unfold in a particular execution.



    These abstract concepts from computer science are tested through application to an ongoing project in a domain science: currently the analysis of streaming data from meteorological and hydrological sensors in the field at Harvard Forest. Recent efforts have focused on defining and executing such analyses using Little-JIL (a high-level graphical process language); developing methods for creating persistent DDGs as Little-JIL processes execute; exploring strategies for storing, querying, and visualizing DDGs; and comparing Little-JIL to other common used workflow programs such as Kepler and Taverna.



    Work for 2012 will focus on the use of database technologies to store, query, and visualize DDGs as well as the implementation of more complex analytical processes (including quality control and modeling of streaming data) in Little-JIL and other workflow programs.

  • Research Category: Ecological Informatics and Modelling