Harvard Forest Summer Research in Ecology projects 2011
Harvard Forest research focuses on the effects of natural and human disturbances on forest ecosystems, including global warming, hurricanes, forest harvesting, and invasive organisms. Researchers come from many disciplines, and specific projects center on population and community ecology, paleoecology, land-use history, aquatic ecology, biochemistry, soil science, ecophysiology, and atmosphere-biosphere exchanges.
Beginning in 2010, a number of the research projects are Group projects. These group projects reflect the collaborative spirit and enterprise in which contemporary ecological and environmental research take place. Each group will consist of 2 or 3 students working collaboratively with 2 or 3 mentors as a single team of 4-6 researchers. Each group project has an over-arching interdisciplinary theme, and because of the interdisciplinary nature of each group project, we don't expect that either students or mentors will have expertise in all aspects of the project. All group projects have a data-fusion component, and will involve linking data collected in the field with statistical analysis and environmental forecasting models.
Within each group project, students will work with mentors on, and take ownership of, clearly identified sub-projects. At the same time, we intend that students and mentors work together on all aspects of the group project. To ensure that sub-projects remain integrated in the larger context, all members of a project team will meet together at least once each week.
View all projects offered for the Summer Research in Ecology for Undergraduates Program
Projects Sorted by CategoryInvasive Plants, Pests & Pathogens
Physiological Ecology, Population Dynamics, and Species Interactions
Large Experiments and Permanent Plot Studies
Conservation and Management
Soil Carbon and Nitrogen Dynamics
Ecological Informatics and Modelling
Title: Group project: Data processing for a real-time hydrological sensor network
Description: The role of forests in maintaining drinking water quality is keenly felt in Massachusetts, where the Quabbin Reservoir (whose headwaters lie partly in the Harvard Forest, 10 km to the north) depends on the surrounding forests to supply clean, untreated water to the Boston metropolitan area. How does the forest provide this “ecosystem service” and how might this change in future years? The basic elements of the water cycle are easily described: water moves from rain and snow into soils and streams and out of the forest via evapotranspiration, stream discharge, and groundwater flow; each step involves changes in dissolved C, N, and other chemical components, most of which are mediated by biological processes. But the details of this cycle are far more complex and are the object of study by scientists at Harvard Forest using an array of long-term meteorological, hydrological, and eddy flux stations and a field wireless network.
A critical problem for such studies is how to process streams of sensor data in such a way that the results are reliable and reproducible. The solution appears to lie in the use of “provenance metadata” to document rigorously how data are transformed from start to finish. In this project a team of students and mentors will address various aspects of generating and managing the provenance metadata for a real-time hydrological sensor network. Three students will work on three closely-related projects:
(1) Provenance metadata querying: The Data Derivation Graphs (DDGs) that hold provenance metadata are generally large and complex. Scientists will wish to extract information from these graphs to perform post hoc analyses of the processing results. For example, if output values are surprising, the scientist may want to explore the provenance metadata to determine the cause. Was the equipment faulty? Was a different version of software installed? The first student will work with scientists to determine the types of queries scientists would like to ask and then implement a tool that enables scientists to ask those queries and that presents the query results in a meaningful way. The goal is to demonstrate the usefulness of the DDGs for direct exploration by a scientist, without requiring the scientist to learn a query language.
(2) Provenance interoperability: There are many software technologies available for defining and executing scientific processes, each with its own strengths and limitations. For example, Little-JIL has excellent support for exception handling, while Kepler has excellent support for streaming large volumes of data among tools external to the workflow. Each of these systems collects and stores provenance metadata, but each does so in a different way. Our second student will work on techniques to integrate these tools by allowing a process written with one technology to (1) utilize results and provenance metadata from an earlier process written in the same technology and (2) utilize results and provenance metadata from an earlier process written in a different technology. The goal is to build provenance metadata structures that accurately reflect the ongoing processing of scientific data as new tools are applied, and without the need to use only one process or workflow system.
(3) Hydrological process modeling: To demonstrate the feasibility of this approach, Little-JIL has been used to describe some basic processes for calibrating and error-checking hydrological sensor data in (near) real time. The third student will expand on this initial work by building more complex processes that combine real-time and retrospective data processing, where the latter may include (for example) subsequent correction of data values for sensor drift or replacement of questionable values with data from another source. Challenges will include: (1) how to represent actual data values and (2) how to build on existing metadata . The goal will be to demonstrate the ability to represent and execute more complex scientific processes.
Students will spend approximately one day per week in the field collecting hydrological data and working on short-term hydrological projects.
Desired skills: All three students should have strong software development skills, including solid Java programming experience. Familiarity with database and querying technologies and interest in exploring techniques for visualizing data are important for the first student. The second student will be expected to learn how to program in Little-JIL, a process programming language, and Kepler, a scientific workflow language. The third student will be expected to learn how to program in Little-JIL and R, a statistical programming language.
Location: Harvard Forest
Capturing Data Provenance with Little-JIL
Scientific Workflow: The Analytic Web
Prospect Hill Hydrological Stations (http://harvardforest.fas.harvard.edu:8080/exist/xquery/data.xq?id=hf070)
Lerner, B., Boose, E., Osterweil, L. J., Ellison, A. M., Clarke, L. A. 2011. Provenance and quality control in sensor networks. Proceedings of the Environmental Information Management (EIM) 2011 Conference, Santa Barbara, California.
Osterweil, L. J., Clarke, L. A., Ellison, A. M., Podorozhny, R., Wise, A., Boose, E. R., Hadley, J. L. 2008. Experience in using a process language to define scientific workflow and generate dataset provenance. Proceedings of The 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (ACM SIGSOFT 2008 / FSE 16).
Boose, E. R., Ellison, A. M. , Osterweil, L. J. , Podorozhny, R. , Clarke, L. , Wise, A. , Hadley, J. L. , Foster, D. R. 2007. Ensuring Reliable Datasets for Environmental Models and Forecasts. Ecological Informatics 2: 237-247.
Ellison, A. M., Osterweil, L. J. , Hadley, J. L. , Wise, A. , Boose, E. R., Clarke, L. , Foster, D. R., Hanson, A., Jensen, D. , Kuzeja, P.S., Riseman, E., Schultz, H. 2006. Analytic Webs Support the Synthesis of Ecological Data Sets. Ecology 87: 1345-1358.
Category(ies): Watershed Ecology
Ecological Informatics and Modelling