Comprehensive Databases to Link Molecular Survey and
Environmental Data
Organizer: Bradley Stevenson (KBS),
Key participants: Tim Hollibaugh (GCE),
Jorge Rodrigues (KBS),
Wade Sheldon (GCE),
Abstract
Molecular surveys of
microbial communities are a first step in understanding an ecosystem, and the
data generated can be of great value when it is linked to information about the
environment from which it came. Molecular surveys at many LTER sites represent
a large amount of sequence data that is linked to many of the best-studied
biomes on the planet. This workshop will facilitate a discussion of the value
of such a database, its structure, and the types of data that could and should
be linked to molecular sequence data. A phylogenetically linked database would
also be a useful place to store genomic data from large-insert clone libraries
(i.e. BAC and fosmid libraries) derived from microbial communities. A standardized
database would provide the structure to make molecular survey data from each
LTER site readily available and also facilitate cross-site investigations of
community structure and function.
The
goal of this workshop was to provide a forum for discussions about the interest
in, and possibility of assembling an LTER network-wide database linking
molecular survey data with other data associated with the same environment
sampled (i.e. environment description, pH, salinity, C:N:P ratio, and
geochemical fluxes).
Presentations
Bradley Stevenson provided an introduction to the
workshop.
Microbial
ecology research at the Kellogg Biological Station LTER in Gull Lake, MI
focuses on two major areas of interest; 1) Microbial Diversity and Abundance,
and 2) Linking Community Structure with Ecosystem Functions.
Molecular
surveys of the KBS ecosystems focus on the small subunit ribosomal RNA genes,
as well as other functional genes are used to provide information as to which
microorganisms make up the microbial community, how many are there, and what
metabolic functions are being carried out by this community. This type of approach generates vast
quantities of sequence data. If this
data were also linked to data about the sampled environments, patterns of
abundance and diversity could begin to emerge.
Cultivation
and characterization of microorganisms allows for better understanding of their
metabolic capabilities. Although
previously uncultivated microbes are being isolated every day, this approach is
limited by the vast majority of microorganisms that remain uncultivated. One way around this issue is through the
analysis of metabolic diversity through cultivation-independent studies of
genetic diversity. Sequencing of large
insert libraries such as bacterial artificial chromosomes (BAC) and fosmids
allows us to directly investigate the potential metabolic capabilities of a
microbial assemblage without cultivation.
This approach also generates copious amounts of sequence data that would
be amenable to a database environment.
The
LTER network and Microbial Observatories represent some of the best-studied
environments in the biosphere. A
database containing metadata linking sequence data with other environmental
characteristics would allow us to ask questions on a local, regional, and
global scale. Several hurdles need to be
tackled in order for this to happen. In
order to correlate data from diverse environments and studies, the types of
data included, experimental methodology, and the data themselves need to be
standardized. Also, these databases
would need to be connected across all participating LTER and Microbial
Observatory sites.
Tim
Hollibaugh (GCE) provided the group with a firsthand perspective on this
initiative from having led a similar discussion with John Priscu (MCM) at the
previous LTER all scientists meeting (2000. Snowbird, UT) (1), and his association with the Sapelo Island
Microbial Observatory (http://simo.marsci.uga.edu/). Principal investigators associated with the
Microbial Observatories are also interested in addressing the same questions by
linking ecosystem metadata with bacterial gene sequences. The first fruits of this effort are the
Sapelo Island Microbial Observatory Database, which links all information about
samples, clones, 16S gene sequences, and other SIMO research products in a
relational database management system (2).
Jorge
Rodrigues (KBS) presented his work with cloning large fragments of genomic DNA
directly from environmental samples.
These libraries provide a wealth of functional genetic data that should
also be included in a comprehensive database.
With libraries of functional genes from a studied ecosystem in the same
database with molecular survey and environmental data, links between community
structure, environmental characteristics, and ecosystem functions can be
explored as never before.
Wade
Sheldon (GCE) provided an inside look at the structure of the SIMO database and
how it could be expanded to include additional types of data. The database is designed primarily to store,
query, and distribute data from SIMO research, and no analytical processing
capabilities are currently planned. However, information derived from
independent analytical programs, such as FASTA and BLAST, will be stored in the
database and sequence formatting options will be provided to facilitate use of
SIMO data with other databases and applications (2).
Discussion Points
Discussion
included topics such as:
·
Where
would/should funding and oversight for such a database come from?
o The emphasis for cross-site collaboration and data
dissemination within the LTER network makes this an obvious choice. Additionally, the large amount of
environmental data available and long-term commitment of maintaining the LTER
sites makes them ideal
·
What would be
the incentive for researchers to submit their data, a time-consuming process?
o It was suggested that deposit of data might be
required in exchange for use of an LTER site for research
·
How would
quality control of data be handled?
o As this is a major source of effort with large
databases such as NCBI’s Genbank, this is of great concern. If individual LTER sites were responsible for
the data maintained relative to their site, this effort could be spread among
many different members of the LTER scientific community.
o Managers of these databases at LTER sites could have
this responsibility but standards would have to be agreed upon
·
Standardization
of data content and quality
o Managers of individual LTER site databases might
coordinate these standards agreed upon by the scientific community
o Adherence to or modeled after the Ecological Meta
Language (EML, http://knb.ecoinformatics.org/software/eml/)
adopted by the LTER
This
workshop was very productive in several regards. With a show of hands, the majority of
participants were not present at previous discussions held at the last LTER-All
Scientists meeting in 2000. Bringing the
past, current, and future issues associated with building a comprehensive
database such as this to researchers from at least nine of the 24 LTER sites
(see list of attendees below) will only help this initiative progress from the
“talking about it”, to the “doing something about it” stage. We were able to illustrate how such a
database might further the ability to ask ecological questions at a local,
regional, and global scale. Furthermore,
the dissemination of data that might otherwise not be used by itself in a
publication could be used with similar data collected around the globe,
potentially fostering a larger understanding of the environmental and
biological forces that act upon our ecosystems.
List of Attendees
|
Name |
LTER site affiliation |
Institution |
email |
|
Kate
Beard |
- |
|
|
|
Jill
Mikucki |
MCM |
Montana
State Univ. |
jmikucki@montana.edu |
|
Heather
Adams |
ARC |
|
hea@umich.edu |
|
Stephanie
Eichorst |
KBS |
Michigan
State Univ. |
|
|
Kristin
Huizinga |
KBS |
Michigan
State Univ. |
|
|
Pat
Schloss |
BNZ |
Univ. of
Wisconsin-Madison |
pds@plantpath.wisc.edu |
|
Brian
Rash |
CAP |
Louisiana
State Univ. |
|
|
Justin
Brant |
HJA |
Oregon
State Univ. |
|
|
|
GCE |
|
|
|
Greg
Bonito |
CWT |
|
gmb2@duke.edu |
|
Justin
Lyons |
GCE |
|
|
|
Tim
Hollibaugh |
GCE |
|
|
|
Wade
Sheldon |
GCE |
|
sheldon@uga.edu |
|
Jorge
Rodrigues |
KBS |
Michigan
State Univ. |
|
|
Bradley
Stevenson |
KBS |
Michigan
State Univ. |
|
References
1. Hollibaugh, J. T., and J. S. Priscu. 2000. Presented at the LTER All-Scientists Meeting, Snowbird, Utah, USA.
2. Sheldon, W. M., M. A. Moran, and J. T. Hollibaugh. 2002. Presented at the 6th world multiconference on systematics, cybernetics, and informatics--Information systems developments II.