Comprehensive Databases to Link Molecular Survey and Environmental Data

 

Organizer: Bradley Stevenson (KBS), Michigan State University

Key participants: Tim Hollibaugh (GCE), University of Georgia

Jorge Rodrigues (KBS), Michigan State University

Wade Sheldon (GCE), University of Georgia

Abstract

Molecular surveys of microbial communities are a first step in understanding an ecosystem, and the data generated can be of great value when it is linked to information about the environment from which it came. Molecular surveys at many LTER sites represent a large amount of sequence data that is linked to many of the best-studied biomes on the planet. This workshop will facilitate a discussion of the value of such a database, its structure, and the types of data that could and should be linked to molecular sequence data. A phylogenetically linked database would also be a useful place to store genomic data from large-insert clone libraries (i.e. BAC and fosmid libraries) derived from microbial communities. A standardized database would provide the structure to make molecular survey data from each LTER site readily available and also facilitate cross-site investigations of community structure and function.

 

The goal of this workshop was to provide a forum for discussions about the interest in, and possibility of assembling an LTER network-wide database linking molecular survey data with other data associated with the same environment sampled (i.e. environment description, pH, salinity, C:N:P ratio, and geochemical fluxes).

 

Presentations

 

          Bradley Stevenson provided an introduction to the workshop. 

Microbial ecology research at the Kellogg Biological Station LTER in Gull Lake, MI focuses on two major areas of interest; 1) Microbial Diversity and Abundance, and 2) Linking Community Structure with Ecosystem Functions.

Molecular surveys of the KBS ecosystems focus on the small subunit ribosomal RNA genes, as well as other functional genes are used to provide information as to which microorganisms make up the microbial community, how many are there, and what metabolic functions are being carried out by this community.  This type of approach generates vast quantities of sequence data.  If this data were also linked to data about the sampled environments, patterns of abundance and diversity could begin to emerge. 

Cultivation and characterization of microorganisms allows for better understanding of their metabolic capabilities.  Although previously uncultivated microbes are being isolated every day, this approach is limited by the vast majority of microorganisms that remain uncultivated.  One way around this issue is through the analysis of metabolic diversity through cultivation-independent studies of genetic diversity.  Sequencing of large insert libraries such as bacterial artificial chromosomes (BAC) and fosmids allows us to directly investigate the potential metabolic capabilities of a microbial assemblage without cultivation.  This approach also generates copious amounts of sequence data that would be amenable to a database environment.

The LTER network and Microbial Observatories represent some of the best-studied environments in the biosphere.  A database containing metadata linking sequence data with other environmental characteristics would allow us to ask questions on a local, regional, and global scale.  Several hurdles need to be tackled in order for this to happen.  In order to correlate data from diverse environments and studies, the types of data included, experimental methodology, and the data themselves need to be standardized.  Also, these databases would need to be connected across all participating LTER and Microbial Observatory sites. 

Tim Hollibaugh (GCE) provided the group with a firsthand perspective on this initiative from having led a similar discussion with John Priscu (MCM) at the previous LTER all scientists meeting (2000. Snowbird, UT) (1), and his association with the Sapelo Island Microbial Observatory (http://simo.marsci.uga.edu/).  Principal investigators associated with the Microbial Observatories are also interested in addressing the same questions by linking ecosystem metadata with bacterial gene sequences.  The first fruits of this effort are the Sapelo Island Microbial Observatory Database, which links all information about samples, clones, 16S gene sequences, and other SIMO research products in a relational database management system (2). 

Jorge Rodrigues (KBS) presented his work with cloning large fragments of genomic DNA directly from environmental samples.  These libraries provide a wealth of functional genetic data that should also be included in a comprehensive database.  With libraries of functional genes from a studied ecosystem in the same database with molecular survey and environmental data, links between community structure, environmental characteristics, and ecosystem functions can be explored as never before. 

Wade Sheldon (GCE) provided an inside look at the structure of the SIMO database and how it could be expanded to include additional types of data.  The database is designed primarily to store, query, and distribute data from SIMO research, and no analytical processing capabilities are currently planned. However, information derived from independent analytical programs, such as FASTA and BLAST, will be stored in the database and sequence formatting options will be provided to facilitate use of SIMO data with other databases and applications (2).

 

Discussion Points

 

Discussion included topics such as:

·       Where would/should funding and oversight for such a database come from? 

o      The emphasis for cross-site collaboration and data dissemination within the LTER network makes this an obvious choice.  Additionally, the large amount of environmental data available and long-term commitment of maintaining the LTER sites makes them ideal

·       What would be the incentive for researchers to submit their data, a time-consuming process?

o      It was suggested that deposit of data might be required in exchange for use of an LTER site for research

·       How would quality control of data be handled? 

o      As this is a major source of effort with large databases such as NCBI’s Genbank, this is of great concern.  If individual LTER sites were responsible for the data maintained relative to their site, this effort could be spread among many different members of the LTER scientific community.

o      Managers of these databases at LTER sites could have this responsibility but standards would have to be agreed upon

·       Standardization of data content and quality

o      Managers of individual LTER site databases might coordinate these standards agreed upon by the scientific community

o      Adherence to or modeled after the Ecological Meta Language (EML, http://knb.ecoinformatics.org/software/eml/) adopted by the LTER

 

 

This workshop was very productive in several regards.  With a show of hands, the majority of participants were not present at previous discussions held at the last LTER-All Scientists meeting in 2000.  Bringing the past, current, and future issues associated with building a comprehensive database such as this to researchers from at least nine of the 24 LTER sites (see list of attendees below) will only help this initiative progress from the “talking about it”, to the “doing something about it” stage.  We were able to illustrate how such a database might further the ability to ask ecological questions at a local, regional, and global scale.  Furthermore, the dissemination of data that might otherwise not be used by itself in a publication could be used with similar data collected around the globe, potentially fostering a larger understanding of the environmental and biological forces that act upon our ecosystems.     

 

List of Attendees

 

Name

LTER site affiliation

Institution

email

Kate Beard

-

Univ. of Maine

beard@spatial.maine.edu

Jill Mikucki

MCM

Montana State Univ.

jmikucki@montana.edu

Heather Adams

ARC

Univ. of Michigan

hea@umich.edu

Stephanie Eichorst

KBS

Michigan State Univ.

eichors3@msu.edu

Kristin Huizinga

KBS

Michigan State Univ.

huizin9@msu.edu

Pat Schloss

BNZ

Univ. of Wisconsin-Madison

pds@plantpath.wisc.edu

Brian Rash

CAP

Louisiana State Univ.

brash1@lsu.edu

Justin Brant

HJA

Oregon State Univ.

justin.brant@oregonstate.edu

Erin Biers

GCE

Univ. of Georgia

ejbiers@uga.edu

Greg Bonito

CWT

Duke Univ.

gmb2@duke.edu

Justin Lyons

GCE

Univ. of Georgia

jlyons@uga.edu

Tim Hollibaugh

GCE

Univ. of Georgia

aquadoc@uga.edu

Wade Sheldon

GCE

Univ. of Georgia

sheldon@uga.edu

Jorge Rodrigues

KBS

Michigan State Univ.

rodrig76@msu.edu

Bradley Stevenson

KBS

Michigan State Univ.

steven77@msu.edu

 

 

 

References

 

1.         Hollibaugh, J. T., and J. S. Priscu. 2000. Presented at the LTER All-Scientists Meeting, Snowbird, Utah, USA.

2.         Sheldon, W. M., M. A. Moran, and J. T. Hollibaugh. 2002. Presented at the 6th world multiconference on systematics, cybernetics, and informatics--Information systems developments II.