Develop improved open access platforms for computational analysis of large genomic datasets.
A major challenge in this data-rich age of biology is integrating large sets of heterogeneous, distributed, and error-prone primary and derived data into predictive models of biological function ranging from a single gene to entire organisms and their ecologies. Nowhere are the barriers to discovery, characterization, and prediction more formidable than in efforts to understand the complex interplay between biological and abiotic processes that influence soil, water, and climate dynamics and impact the productivity of our biosphere. A bewildering diversity of plants, microbes, animals, and their interactions needs to be discovered and characterized to mechanistically understand ecological function and thereby facilitate interventions to improve outcomes.
To develop models of these biological processes, organisms and their interactions, scientists need the ability to use a variety of sophisticated computational tools to analyze their own complex and heterogeneous data sets, and then integrate their data and results effectively with the work of others. The DOE Systems Biology Knowledgebase (KBase) is an open source software platform designed to make it easier for scientists to create, execute, collaborate on, and share sophisticated reproducible analyses of their own biological data in the context of public reference data and data shared by other users. KBase is open access and free for anyone to use.
KBase supports a growing and extensible set of integrated applications (“apps”) for genome assembly, annotation, metabolic model reconstruction, flux balance analysis, expression analysis, and comparative genomics (see figure at right). In addition to these tools, the KBase platform provides data integration and search, along with easy access to shared user analyses of public plant and microbial reference data from a number of external resources. KBase users have already applied the system to address a range of scientific problems, including comparative genomics of plants, prediction of microbiome interactions, and deep metabolic modeling of environmental and engineered microbes.
Q1: Report on the latest capabilities for annotating and assembling genome-based metabolic models of microbial metabolism. [PDF]
Q2: Report on the new capabilities to perform reproducible genomics analyses on large datasets and share the results with other researchers.
Q3: Describe the latest developments for working with and analyzing plant genomes.
Q4: Report on new capabilities and approaches for analyzing metagenomics datasets.
End-Of-Year Report: Summary of development of improved open access platforms for computational analysis of large genomic datasets