You are here

Harvard Forest >

Summer Research Project 2020

  • Title: Provenance and Computer Modeling: Provenance tools to support modeling
  • Group Project Leader: Emery Boose
  • Mentors: Emery Boose; Barbara Lerner; Jonathan Thompson
  • Collaborators: Aaron Ellison; Margo Seltzer
  • Project Description:

    Please note: This project will work in collaboration as a group project with other projects listed under "Provenance and Computer Modeling"

    Over-arching Intellectual Theme
    This project will bring together two important applications of computer technology in science: (1) the use of simulation models to better understand physical and biological processes in the real world, and (2) the use of provenance to record and better understand how analytical tools such as models are used by scientists. The modeling sub-project will explore the impacts of future hurricanes in New England under various climate change scenarios. The provenance sub-project will develop software tools that use provenance and evaluate how effectively these and existing tools support scientists in their work, with a focus on computer modeling.

    Provenance tools to support modeling
    Mentors: Lerner & Boose

    The software tools that scientists use to process and analyze data are typically optimized for performance and ease of use. Few if any such tools are designed to capture and record the details of what happens as the tool performs its task(s). This detailed information, and more generally the history of an item of data from its creation to its present state, is known as provenance. The Provenance Project at Harvard Forest has developed tools to collect provenance for the R statistical language, along with applications that use provenance to support useful tasks such as script debugging and cleaning.

    This sub-project will provide an opportunity to develop new applications that use provenance (for example, a lint-like utility that flags common problems for R programmers, such as unintended type changes or variables bound before script execution; or a utility to compile and archive the provenance from multiple model runs). It will also provide an opportunity to conduct a user study to evaluate the ease with which provenance tools are installed and used and the extent to which they support scientists in their work.

    Desired Skills: Students must have strong software engineering skills, an interest in conducting user studies, and experience with (or willingness to learn) R.

    Provenance Project website: http://end-to-end-provenance.github.io/

  • Readings:

    Boose, E. R., Chamberlin, K. E., Foster, D. R. 2001. Landscape and regional impacts of hurricanes in New England. Ecological Monographs 71: 27-48.

    Duveneck, M. J., Thompson, J. R., Gustafson, E. J., Liang, Y., de Bruijn, A. M. G. 2017. Recovery dynamics and climate change effects to future New England forests. Landscape Ecology 32: 1385-1397.

    Ellison, A. M. 2010. Repeatability and transparency in ecological research. Ecology 91: 2536-2539.

    Lerner, B. S., Boose, E. R., Perez, L. 2018. Using Introspection to Collect Provenance in R. Informatics, 5, 12.

    Liang, Y., Duveneck, M., Gustafson, E., Serra-Diaz, J., Thompson, J. R. 2017. How disturbance, competition and dispersal interact to prevent tree range boundaries from keeping pace with climate change. Global Change Biology. DOI: 10.1111/gcb.13847

    Pasquier, T., Lau, M. K., Trisovic, A., Boose, E. R., Couturier, B., Crosas, M., Ellison, A. M., Gibson, V., Jones, C. R., Seltzer, M. 2017. If these data could talk. Nature Scientific Data, 4.

    Thompson, J. R., Simons-Legaard, E., Legaard, K., Domingo, J. B. 2016. A LANDIS-II extension for incorporating land use and other disturbances. Environmental Modelling & Software 75: 202-205.

  • Research Category: Regional Studies, Group Projects, Ecological Informatics and Modelling, Conservation and Management