Generated November 1, 2021

A KBase Case Study on Genome-wide Transcriptomics and Plant Primary Metabolism in response to Drought Stress in Sorghum.

Abstract

A better understanding of the genetic and metabolic mechanisms that confer stress resistance and tolerance in plants is key to engineering new crops through advanced breeding technologies. This requires a systems biology approach that builds on a genome-wide understanding of the regulation of gene expression, plant metabolism, physiology and growth. In this study, we examine the response to drought stress in Sorghum, as we leverage the tools for transcriptomics and plant metabolic modeling we have implemented at the U.S. Department of Energy Systems Biology Knowledgebase (KBase). KBase enables researchers worldwide to collaborate and advance research by allowing them to upload private or public data into the KBase Narrative Interface, empowering them to analyze this data using a rich, extensible array of computational and data-analytics tools, and allows them to securely share scientific workflows and conclusions. We demonstrate how to use the current RNA-seq tools in KBase, applicable to both plants and microbes, to assemble and quantify long transcripts and identify differentially expressed genes effectively. More specifically, we demonstrate the utility of the platform by identifying key genes that are differentially expressed during drought-stress in Sorghum bicolor, which is an important sustainable production crop plant. We then show how to use KBase tools to predict the membership of genes in metabolic pathways and examine expression data in the context of metabolic subsystems. We demonstrate the power of the platform by making the data analysis and interpretation available to the biologists in the reproducible, re-usable, point-and-click format of a KBase Narrative thus promoting FAIR (Findable, Accessible, Interoperable and Reusable) guiding principles for scientific data management and stewardship.

Case study as an example to run the KBase workflow

This narrative demonstrates the gene expression profiling and its downstream metabolic mapping analysis in KBase [1] using a subset of the published data set [2] as a use case to investigate the genes and pathways in leaves and roots of Sorghum that are differentially impacted by drought stress.

This narrative targets the short reads dataset from leaf and root samples of a published study [2] of the Sorghum RTx430 genotype after eight weeks of growth under water limitation to simulate drought conditions in the field study. For gene expression profiling, the recently released RTx430 genome from JGI Phytozome is used to guide the transcript assembly [Sorghum bicolor RTx430 v2.1, DOE-JGI, http://phytozome.jgi.doe.gov/]. This genome does not have any functional annotation, therefore, we decided to annotate the genome with plant enzymes prior to running the RNA-seq analysis so that the differentially expressed genes can be characterized based on the currently available enzyme information. Next, the annotated genome is used to reconstruct Sorghum primary metabolism. Finally, the expression abundance is integrated with the annotated enzymes in the metabolic reconstruction to identify and visualize primary metabolic pathways of interest.

Data Description

This narrative uses a subset of the Sorghum RTx430 reads dataset published by a drought study, EPICON (Epigenetic Control of Drought Response in Sorghum bicolor), funded by the US Department of Energy [2]. Sorghum plant samples of genotype RTx430 were subjected to drought stress and samples were collected 56 days after sowing in the field. From this field experiment, 3rd and 4th leaves and the top 30 cm of the root systems from 10 plants per plot for each sample were taken. The plants were grown in a field (Parlier, CA, USA; 36.6008°N, 119.5109°W) with sandy loam soil containing silky substratum (pH 7.37). Field plots were watered before planting; weekly irrigation started from day 16 [2].The complete data set is available at NCBI and can be accessed by using Project/GEO/SRA accession numbers [Project: PRJNA527782; GEO: GSE128441; SRA: SRP188707]. The published study [2] used the Sorghum reference genome BTx623 for reference guided gene profiling of the raw sequencing reads obtained from Sorghum RTx430 samples.

For this case study, only the data set of week 8 time point (56 days after sowing in the field) from Sorghum genotype RTx430 is selected that includes sequencing reads obtained from leaf and root tissues under two conditions, well-watered (ww) as control and drought-stressed (dr) as pre flowering drought treatment with three replicates per condition. Therefore, a total 12 samples are used for transcriptome analysis (Table S2).

Workflow steps

The KBase gene expression profiling and metabolic mapping with primary metabolism workflow is organized in three modules:

  1. Import and preparation of genome and sequencing reads
  2. Alignment, assembly and quantification
  3. Metabolic reconstruction, integration and visualization of abundance data

Figure 1. A schematic overview of KBase gene expression profiling and metabolic mapping with primary metabolism workflow organized in three modules: Module 1 - import and preparation of genome and sequencing reads; Module 2 - Alignment, assembly and quantification; Module 3 - Metabolic reconstruction, integration and visualization of abundance data.

Module 1 starts with the import of the reference genome and annotation of the reference genome with enzymatic functions of plant primary metabolism using Orthofinder [3]. Next step is import of raw sequencing reads, reads processing using quality check tools such as FastQC [4] and various reads filtering tools such as Trimmomatic [5] and CutAdapt [6].

Module 2 focuses on reference genome guided gene expression profiling based on Tuxedo suite [7,8] and allows users to map mRNA short reads to the annotated genome using HISAT2 [9,10], assemble the transcripts using StringTie [8,11], and identify the counts-based differential gene-expression between drought and well-water control conditions using DESeq2 [12-14]. Note that while gene expression profiling in KBase can be done on any reference genome, here, the PlantSEED/Orthofinder annotation App is run on the genome prior to running the RNA-seq analysis so that the differentially expressed genes can be evaluated in a metabolic context.

Module 3 supports metabolic mapping capabilities and starts with the construction of a primary metabolism model using the annotated genome to identify chemical reactions curated for the metabolic network [15]. The expression abundance generated via Module 2 is readily interoperable with the Module 3 metabolic mapping to characterize the primary metabolic reactions specific to the experimental treatment. Finally, metabolic pathways can be visualized by using Escher Pathway Viewer [16] App, that is adapted in KBase for visualizing metabolic models and related data.

Here are the specific steps to run this use case in this Narrative.

  1. Import of Sorghum RTx430 Genome

    This step imports S. bicolor RTx430 genome as a reference genome serving as the basis for RNA-Seq alignment (S. Bicolor RTx430 genome). The genome was taken from JGI Phytozome V13.

  2. Annotation of Sorghum RTx430 Genome with Metabolic Enzymes

    This step annotates the (S. bicolor RTx430 genome) with metabolic enzymes based on PlantSEED models using "Annotate Plant Enzymes with OrthoFinder" App. It results into the annotated genome as an output object "SbiRTx430v2.1_annotated", as given in the Data Pane (Table S3).

  3. Import of 12 SingleEndLibrary Reads from NCBI

    This step imports 12 SRA files as SingleEndLibrary Objects from NCBI in the Data Pane using "Import of SRA reads from Web" App.

  4. Group of 12 SingleEndLibrary objects as SampleSet

    This step groups all the 12 imported reads SingleEndLibrary Objects based on the condition and tissue type and creates a Sample Set "RTx430_sampleset". This is used as an input for FastQC, HISAT2 Alignment and DESeq2 differential expression to simplify the process.

  5. Quality Assessment of SampleSet using FastQC

    This step provides the reads quality assessment for each individual sample of "RTx430_sampleset."

  6. Align Reads of SampleSet using HISAT2

    This step aligns each SRA reads for individual sample from the the SampleSet "RTx430_sampleset" to the SbiRTx430 v2.1 annotated reference genome using HISAT2 and provides position-level coverage for each sample. Additionally, it generates a merged alignment set object labelled as "RTx430_sampleset_alignment_set" in the Narrative, which can be run in batch mode in the next step of StringTie.

  7. Assemble Transcripts using StringTie

    This step generates gene expression abundance for each sample and an expression set object “RTx430_sampleset_gene_expression_set” that wraps multiple sample results into a single expression set for downstream analysis. In addition, it also provides the normalized expression matrix for all samples in both FPKM and TPM formats for easy download from data pane. This App also generates interactive histogram plots for expression abundance in FPKM and TPM. The normalized expression matrix "RTx430_sampleset_TPM_ExpressionMatrix" is also available as a supplementary file for those interested in comparing relative expression across multiple samples (Table S5).

  8. Create Average Expression Matrix

    In the next step, to get the average abundances for each gene in each condition averaged across the biological replicates “Create Average Expression Matrix” App is run on the normalized expression matrix “RTx430_sampleset_TPM_ExpressionMatrix”, resulting in “RTx430_sampleset_TPM_ExpressionMatrix_average” expression matrix as an output. This average expression matrix is later used in the workflow to assign reaction level expression scores to study plant primary metabolism.

  9. Differential Expression using DESeq2

    This step calculates gene-annotation expression-level differences by condition using DESeq2. This step calculates the all-by-all differential expression matrix. “Create Differential Expression Matrix using DESeq2” App is run on the multi-sample expression set “RTx430_sampleset_gene_expression_set” resulting in a “DifferentialExpressionMatrixSet” object “RTx430_DESeq2_all”. This object groups a set the six “DifferentialExpressionMatrix” objects that represent all possible pairwise combinations of four treatments. The complete lists of differential expression analysis of leaf and root samples of SbiRTx430 genome under drought stress are provided as Table S6 and Table S7 respectively.

  10. Create up/down Regulated Feature Set and Differential Expression Matrix

    This step runs “Create up/down regulated FeatureSet and ExpressionMatrix” App to get the significant differential expression of up and down regulated genes across all samples with normalized expression “RTx430_sampleset_TPM_ExpressionMatrix_average” and differential expression “RTx430_DESeq2_all” as input. This App selectes genes that exhibit differential expression (either upregulated or downregulated) for drought vs well-watered conditions based on the given statistical threshold of FDR-adjusted p-value (alpha cut off) and log2 fold change. This App is run separately for leaf and root tissues based on the adjusted p-value below 0.05 and absolute log2 fold change greater than or equal to 1 and 2 respectively.

  11. Metabolic Reconstruction

    This step reconstructs genome-scale metabolic networks of plant primary metabolism using RTx430 genome that has previously been annotated with plant enzymes using OrthoFinder. It gives "SbiRTx430v2.1_reconstructed" object as an output in the Data pane.

  12. Integration of Abundance with Metabolism

    This step integrates the normalized expression matrix generated by StringTie “RTx430_sampleset_TPM_ExpressionMatrix_average”, with the reconstructed FBA model "SbiRTx430v2.1_reconstructed" generated by the previous step to generate the reaction matrix. The complete list of reaction matrices generated by this study for each experimental condition (drought and well-watered as control) in leaves and roots are given as supplementary tables S8 and S9 respectively.

  13. Visualization of Metabolic Pathways

    This step uses “Escher Pathway Viewer” app to displays the highly differentially expressed genes identified by DESeq2 on a global metabolic map of plant metabolism based on the gene-reaction associations in the PlantSEED reconstructed metabolic model.

Import of Sorghum RTx430 Genome

Sorghum bicolor RTx430 (v2.1) genome was imported directly from the JGI Phytozome site (S. bicolor RTx430 v2.1, DOE-JGI, http://phytozome.jgi.doe.gov/) through globus into KBase staging area and includes the JGI v2.0 assembly of Sorghum bicolor and the JGI v2.1 annotation as “SbicolorRTx430_552_v2.fa.gz” and “SbicolorRTx430_552_v2.1_gene.gff3.gz” files respectively.

App - Import GFF3/FASTA file as Genome from Staging Area

Inputs:

  • “SbicolorRTx430_552_v2.fa.gz” - JGI v2.0 assembly of Sorghum bicolor
  • “SbicolorRTx430_552_v2.1_gene.gff3.gz” - JGI v2.1 annotation

Output:

  • The FASTA and GFF annotations are imported into the RTx430 Transcriptomics Analysis Narrative as the genome object called “SbicolorRTx430v2.1”

Analysis:

  • This genome has 34,601 protein-coding genes and 46,881 protein-coding transcripts. This genome object was next annotated for functional information and is used to align the reads during HISAT2 alignment and onwards.

Note, this particular genome has no functional annotation information following import because the GFF file that was imported included no functional annotations (only structural annotations). If a GFF file does contain functional annotations, those would have been added to the genome during import. The next step of the workflow is to functionally annotate this genome in KBase before it is subjected to RNA-seq analysis.

Import a GFF3 and FASTA file from your staging area into your Narrative as a Genome data object
This app completed without errors in 10m 55s.
Objects
Created Object Name Type Description
SbicolorRTx430_v2.1 Genome Imported Genome
Links
Output from Import GFF3/FASTA file as Genome from Staging Area
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/101788

Annotation of the Sorghum RTx430 Genome with Metabolic Enzymes

SbiRTx430v2.1 genome is annotated with metabolic enzymes using PlantSEED models using "Annotate Plant Enzymes with OrthoFinder" App.

Input:

  • SbiRTx430v2.1 genome object

Output:

  • SbiRTx430v2.1_annotated

Analysis

  • This App provides the table as output with all predicted enzymes in the KBase Genome and the curated Arabidopsis enzymes in the PlantSEED database. The App also shows a figure and a table as part of its report. The table is easily downloaded in CSV format and allows the user to find which genes were annotated with which enzymatic function.
  • From this table, for the RTx430v2.1 genome, the Plant OrthoFinder app clustered 1,766 protein sequences with 1,419 PlantSEED-curated genes.

  • This App provides the “SbiRTx430v2.1_annotated” as an output in Data Pane. It provides the table as output with all predicted enzymes in the KBase Genome and the curated Arabidopsis enzymes in the PlantSEED database (Table S3).

Annotates transcripts in a Genome object with metabolic functions using OrthoFinder.
This app completed without errors in 5h 44m 10s.
Objects
Created Object Name Type Description
SbiRTx430v2.1_annotated Genome Plant genome SbicolorRTx430_v2.1 annotated with metabolic functions
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/101788
  • OrthoFinder_Output.txt - Output text generated by OrthoFinder

Import of 12 SingleEndLibrary Reads from NCBI

Input:

  • The SRA read files of leaves and roots of Sorghum RTx430 genotype at 56 days timepoint under control and pre-flowering drought conditions with three replicates per condition are imported by using the SRA accession number given in Table S2 from NCBI SRA site using “Direct Download Link”. The SRA link for each SRA read file and object name is given as input to “Import SRA Reads from Web” App.

Table S2. SRA read files of leaves and roots of RTx430 at 56 days timepoint under control and preflowering drought conditions

Output:

  • Each SRA file is imported to KBase narrative as a SingleEndLibrary object with the reads statistics as an output.
  • The output of this App for each sample provides an overview of the reads metadata and its statistics summary such as number of reads, mean read length, number of duplicate reads, mean quality score (Phred scale), and total number of bases.

Analysis

  • For this case study, we use data sets corresponding to the time point of week 8 (56 days from sowing) from Sorghum genotype RTx430 that includes 12 single end read files obtained from two conditions, well-watered (control) and drought-stressed (dr treatment) with three replicates per condition in leaves and root tissue. Therefore, a total 12 samples are used for transcriptome analysis.
Import an SRA file from a web URL into your Narrative as a Reads data object.
This app completed without errors in 1h 13m 5s.
Objects
Created Object Name Type Description
RTx430_leaves_ww_r1 SingleEndLibrary Imported Reads
RTx430_leaves_ww_r2 SingleEndLibrary Imported Reads
RTx430_leaves_ww_r3 SingleEndLibrary Imported Reads
RTx430_leaves_dr_r1 SingleEndLibrary Imported Reads
RTx430_leaves_dr_r2 SingleEndLibrary Imported Reads
RTx430_leaves_dr_r3 SingleEndLibrary Imported Reads
Links
Import an SRA file from a web URL into your Narrative as a Reads data object.
This app completed without errors in 2h 25m 33s.
Objects
Created Object Name Type Description
RTx430_roots_ww_r1 SingleEndLibrary Imported Reads
RTx430_roots_ww_r2 SingleEndLibrary Imported Reads
RTx430_roots_ww_r3 SingleEndLibrary Imported Reads
RTx430_roots_dr_r1 SingleEndLibrary Imported Reads
RTx430_roots_dr_r2 SingleEndLibrary Imported Reads
RTx430_roots_dr_r3 SingleEndLibrary Imported Reads
Links

Group of 12 SingleEndLibrary Objects as SampleSet

This step groups Reads (SingleEndLibraries objects) to RNASeqSampleSet.

Input:

  • Three replicates for each condition, well-watered as control (ww) and drought treatment (dr) in each tissue (leaves and roots) are first grouped into respective sample label e.g. RTx430_leaves_ww, RTx430_leaves_dr, RTx430_roots_ww, and RTx430_roots_dr as input parameters.
  • These four groups, along with experimental metadata such as SampleSet Description, Platform, Library Type, Domain, Source and Publication Details constitute the SampleSet object.

Output:

  • RTx430_sampleset

Analysis

  • The merging of all samples as a set "RTx430_sampleset" makes it possible for users to run downstream operations (such as FASTQC, HISAT2, and StringTie) on all samples in a batch mode.
  • The App also associates experimental metadata that makes it easier to run differential expression based on the experimental conditions in the downstream process.
Allows users to provide RNA-seq reads and the corresponding metadata to create an RNASeqSampleSet data object.
This app completed without errors in 4s.
No output found.
Output from Create RNA-seq SampleSet
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/101788

Quality assessment of SampleSet using FastQC

Quality assessment of the reads of the "RTx430_sampleset" object using FastQC. This step provides reads quality assessment for each sample.

Input:

  • RTx430_sampleset

Output:

  • FastQC generates an output of a comprehensive multi-page report on the composition and quality of reads in HTML format, with one page for each of the reads (e.g. Single End, Paired End: forward, Paired End: reverse). The report can be viewed inside the Narrative or as a new web page that can also be downloaded.

  • The HTML report includes results from multiple modules that were run by FastQC, and provides a quick assessment of the quality of the results labeled as normal (green checkmark), slightly abnormal (orange triangle), and very unusual (red cross) reads.

Analysis

  • Based on the summary of the FastQC report, the reads data is found to be of good quality, hence there was no need to trim the reads.
A quality control application for high throughput sequence data.
This app completed without errors in 54m 30s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/101788
  • RTx430_roots_dr_r1_101788_15_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_roots_ww_r2_101788_9_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_roots_ww_r1_101788_7_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_leaves_ww_r2_101788_6_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_leaves_dr_r2_101788_11_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_leaves_dr_r1_101788_10_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_leaves_ww_r3_101788_8_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_roots_dr_r2_101788_16_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_roots_ww_r3_101788_12_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_leaves_ww_r1_101788_5_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_roots_dr_r3_101788_17_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • RTx430_leaves_dr_r3_101788_13_1.single_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report

Align Reads of SampleSet using HISAT2

Align SRA reads in the RTx430_sampleset object to the SbiRTx430 v2.1 annotated reference genome using HISAT2 by selecting all the default parameters.

Input:

  • RTx430_sampleset object
  • SbiRTx430 v2.1 annotated reference genome

Output:

  • 12 BAM alignment objects for each individual sample and merged alignment set as "RTx430_sampleset_alignmentset"
  • QualiMap report

Analysis

  • Clicking on the "RTx430_sampleset_alignmentset" under the "Result" tab provides the alignment statistics such as total reads, unmapped and mapped reads, multiple alignments, and singletons in the table format for each individual alignment object for each sample.
  • The alignment results show a very high percentage of the reads’ mapping (between 91.33% to 99.1%).

  • Another interesting observation is that a very low percentage of reads (between 2.40% to 3.92%) mapped to multiple locations in the RTx430 genome.

  • QualiMap generates a comprehensive HTML report as well on the quality of the BAM alignments. The mean mapping quality of all the samples is 58.33.

  • PCA plot shows that samples can be grouped based on tissue.

  • The BAM alignment objects can be downloaded to visualize the aligned reads outside KBase using JBrowse and Integrative Genomics Viewer (IGV).

Align sequencing reads to long reference sequences using HISAT2.
This app completed without errors in 3h 37m 49s.
Objects
Created Object Name Type Description
RTx430_leaves_dr_r2_alignment RNASeqAlignment Reads 101788/22/1;101788/11/1 aligned to Genome 101788/19/2
RTx430_leaves_ww_r1_alignment RNASeqAlignment Reads 101788/22/1;101788/5/1 aligned to Genome 101788/19/2
RTx430_leaves_dr_r1_alignment RNASeqAlignment Reads 101788/22/1;101788/10/1 aligned to Genome 101788/19/2
RTx430_roots_dr_r1_alignment RNASeqAlignment Reads 101788/22/1;101788/15/1 aligned to Genome 101788/19/2
RTx430_leaves_ww_r3_alignment RNASeqAlignment Reads 101788/22/1;101788/8/1 aligned to Genome 101788/19/2
RTx430_leaves_dr_r3_alignment RNASeqAlignment Reads 101788/22/1;101788/13/1 aligned to Genome 101788/19/2
RTx430_leaves_ww_r2_alignment RNASeqAlignment Reads 101788/22/1;101788/6/1 aligned to Genome 101788/19/2
RTx430_roots_dr_r3_alignment RNASeqAlignment Reads 101788/22/1;101788/17/1 aligned to Genome 101788/19/2
RTx430_roots_ww_r2_alignment RNASeqAlignment Reads 101788/22/1;101788/9/1 aligned to Genome 101788/19/2
RTx430_roots_ww_r1_alignment RNASeqAlignment Reads 101788/22/1;101788/7/1 aligned to Genome 101788/19/2
RTx430_roots_ww_r3_alignment RNASeqAlignment Reads 101788/22/1;101788/12/1 aligned to Genome 101788/19/2
RTx430_roots_dr_r2_alignment RNASeqAlignment Reads 101788/22/1;101788/16/1 aligned to Genome 101788/19/2
RTx430_sampleset_alignment_set ReadsAlignmentSet Set of all new alignments
Summary
Created 12 alignments from the given alignment set.
Links

Assemble Transcripts using StringTie

Assemble RNA-seq alignments into transcripts with “Assemble Transcripts using StringTie'' App using "RTx430_sampleset_alignment_set'' object as an input and default values for all other parameters. In KBase, the StringTie App is configured to use known transcript models (following the reference annotation guided assembly process) for the expression analysis.

Input:

  • RTx430_sampleset_alignmentset

Output:

  • 12 Expression objects for each individual sample and a merged expression set as RTx430_sampleset_expression_set
  • Normalized expression matrix in FPKM units as RTx430_sampleset_FPKM_ExpressionMatrix
  • Normalized expression matrix in TPM units as RTx430_sampleset_TPM_ExpressionMatrix

Analysis

  • The StringTie App generates gene expression abundance for each sample and an expression set object “RTx430_sampleset_gene_expression_set” that wraps multiple sample results into a single expression set for downstream analysis.
  • Clicking on the "RTx430_sampleset_expression_set" under Results tab generates the interactive histogram plots for expression abundance in FPKM and TPM.
  • In addition, it also provides the normalized expression matrix for all samples in both FPKM and TPM formats for easy download. We have provided the normalized expression matrix in TPM as a supplementary file for those interested in comparing relative expression across multiple samples (Table S5).
Assemble the transcripts from RNA-seq read alignments using StringTie.
This app completed without errors in 52m 41s.
Objects
Created Object Name Type Description
RTx430_sampleset_expression_set ExpressionSet ExpressionSet generated by StringTie
RTx430_leaves_ww_r1_expression RNASeqExpression Expression generated by StringTie
RTx430_leaves_ww_r2_expression RNASeqExpression Expression generated by StringTie
RTx430_leaves_ww_r3_expression RNASeqExpression Expression generated by StringTie
RTx430_leaves_dr_r1_expression RNASeqExpression Expression generated by StringTie
RTx430_leaves_dr_r2_expression RNASeqExpression Expression generated by StringTie
RTx430_leaves_dr_r3_expression RNASeqExpression Expression generated by StringTie
RTx430_roots_ww_r1_expression RNASeqExpression Expression generated by StringTie
RTx430_roots_ww_r2_expression RNASeqExpression Expression generated by StringTie
RTx430_roots_ww_r3_expression RNASeqExpression Expression generated by StringTie
RTx430_roots_dr_r1_expression RNASeqExpression Expression generated by StringTie
RTx430_roots_dr_r2_expression RNASeqExpression Expression generated by StringTie
RTx430_roots_dr_r3_expression RNASeqExpression Expression generated by StringTie
RTx430_sampleset_FPKM_ExpressionMatrix ExpressionMatrix FPKM ExpressionMatrix generated by StringTie
RTx430_sampleset_TPM_ExpressionMatrix ExpressionMatrix TPM ExpressionMatrix generated by StringTie
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/101788
  • stringtie_result.zip - File(s) generated by StringTie App

Create Average Expression Matrix

To find the average abundances for each gene in each condition, the normalized expression matrix is averaged across the biological replicates for each condition.

Input:

  • RTx430_sampleset_TPM_ExpressionMatrix

Output:

  • RTx430_sampleset_TPM_ExpressionMatrix_average

Analysis

  • This average expression matrix “RTx430_sampleset_TPM_ExpressionMatrix_average” is used in a later step to assign reaction level expression scores to study plant primary metabolism.
Create an average ExpressionMatrix data object with one column per condition.
This app completed without errors in 37s.
Objects
Created Object Name Type Description
RTx430_sampleset_TPM_ExpressionMatrix_average ExpressionMatrix Average ExpressionMatrix

Differential expression using DESeq2

Perform differential expression calculations using DESeq2. This step calculates gene-annotation expression-level differences by condition.

Input:

  • RTx430_sampleset_expression_set

Output:

  • RTx430_DESeq2_all

Analysis

  • This step generates the six “DifferentialExpressionMatrix” objects which represent all possible pairwise combinations of four treatments and groups these six objects in a set object "RTx430_DESEq2_all".

  • Clicking on the DifferentialExpressionMatrixSet object "RTx430_DESeq2_all" displays the set viewer and provides interactive volcano plot for each sample. Volcano plot allows the visual identification of the significant genes based on the given statistical threshold cut off of significance -log10 P value along the y-axis and log2 fold change along the x-axis. In a volcano plot, the genes that pass the P value and fold change thresholds, the most upregulated genes are displayed towards the right, the most downregulated genes are towards the left, and the most statistically significant genes are towards the top. On the given threshold cut off, it also provides the list of genes with p- value, q-value, Significance (-log10) and Fold change (log2) under Gene Table tab that can be exported.

  • In addition, DESeq2 generates the Dispersion plot and PCA plot to help understand the inter-sample expression variability.

  • The dispersion plot can be seen as a scatter plot with log2 fold change along the y-axis and normalized mean expression along the x-axis. It is essentially a measure of expression variance for a given mean expression for all genes. As expected, the variability in fold changes is relatively higher for lowly expressed genes.

  • Based on the PCA plot, the first principal component (PC1) accounts for 91% variation among the samples based on tissue (leaf vs root). The second principal component (PC2) accounts for a minor variance of only 6% among the samples based on the treatment (drought vs control).

  • The complete lists of differential expression analysis of leaf and root samples of SbiRTx430 genome under drought stress are provided as Table S6 and Table S7 respectively.

Create differential expression matrix based on a given threshold cutoff
This app completed without errors in 31m 20s.
Objects
Created Object Name Type Description
RTx430_DESeq2_all DifferentialExpressionMatrixSet DifferentialExpressionMatrixSet generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_dr-VS-RTx430_leaves_ww DifferentialExpressionMatrix DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_dr-VS-RTx430_leaves_dr DifferentialExpressionMatrix DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_ww-VS-RTx430_leaves_dr DifferentialExpressionMatrix DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_ww-VS-RTx430_leaves_ww DifferentialExpressionMatrix DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_leaves_dr-VS-RTx430_leaves_ww DifferentialExpressionMatrix DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_dr-VS-RTx430_roots_ww DifferentialExpressionMatrix DifferentialExpressionMatrix generated by DESeq2
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/101788
  • DESeq2_result.zip - File(s) generated by DESeq2 App
  • DESeq2_plot.zip - Visualization plots by DESeq2 App

Create up/down Regulated Feature Set and Differential Expression Matrix

“Create up/down regulated FeatureSet and ExpressionMatrix” App is used to get the significant differential expression of up and down regulated genes and differential expression matrix based on different fold levels.

We ran this app two times to get the differential expression genes and differential expression matrix of “selected pairwise conditions” of leaves drought versus well-watered control and roots drought versus well-watered control based on the adjusted P-value below 0.05 and log2 fold change above 1 and 2 respectively.

Input:

  • RTx430_DESeq2_all
  • RTx430_sampleset_TPM_ExpressionMatrix_average
  • Alpha cut off (0.05)
  • Log2Fold change

  • Specific Pairwise Conditions

    • Leaves drought vs leaves control
    • Roots drought vs roots control

Output:

  • Upregulated - Two Upper FeatureSet
  • Downregulated - Two Lower FeatureSet
  • Two Differential expression matrix

Analysis

  • This App selectes genes that exhibit differential expression (either upregulated or downregulated) for drought vs well-watered conditions based on the given statistical threshold of FDR-adjusted p-value (alpha cut off) and log2 fold change. This App is run separately for leaf and root tissues based on the adjusted p-value below 0.05 and absolute log2 fold change greater than or equal to 1 and 2 respectively.
Create up/down regulated FeatureSet and ExpressionMatrix from differential expression data based on given cutoffs.
This app completed without errors in 2m 23s.
Objects
Created Object Name Type Description
RTx430_DESeq2_all_RTx430_leaves_dr-RTx430_leaves_ww_up_0.05q_1fc_fs FeatureSet Upper FeatureSet Object
RTx430_DESeq2_all_RTx430_roots_dr-RTx430_roots_ww_up_0.05q_1fc_fs FeatureSet Upper FeatureSet Object
RTx430_DESeq2_all_RTx430_leaves_dr-RTx430_leaves_ww_down_0.05q_1fc_fs FeatureSet Lower FeatureSet Object
RTx430_DESeq2_all_RTx430_roots_dr-RTx430_roots_ww_down_0.05q_1fc_fs FeatureSet Lower FeatureSet Object
RTx430_leaves_dr-RTx430_leaves_ww_0.05q_1fc_exp ExpressionMatrix Filtered ExpressionMatrix Object
RTx430_roots_dr-RTx430_roots_ww_0.05q_1fc_exp ExpressionMatrix Filtered ExpressionMatrix Object
Links
Create up/down regulated FeatureSet and ExpressionMatrix from differential expression data based on given cutoffs.
This app completed without errors in 2m 17s.
Objects
Created Object Name Type Description
RTx430_DESeq2_all_RTx430_leaves_dr-RTx430_leaves_ww_up_0.05q_2fc_fs FeatureSet Upper FeatureSet Object
RTx430_DESeq2_all_RTx430_roots_dr-RTx430_roots_ww_up_0.05q_2fc_fs FeatureSet Upper FeatureSet Object
RTx430_DESeq2_all_RTx430_leaves_dr-RTx430_leaves_ww_down_0.05q_2fc_fs FeatureSet Lower FeatureSet Object
RTx430_DESeq2_all_RTx430_roots_dr-RTx430_roots_ww_down_0.05q_2fc_fs FeatureSet Lower FeatureSet Object
RTx430_leaves_dr-RTx430_leaves_ww_0.05q_2fc_exp ExpressionMatrix Filtered ExpressionMatrix Object
RTx430_roots_dr-RTx430_roots_ww_0.05q_2fc_exp ExpressionMatrix Filtered ExpressionMatrix Object
Links

Differential Expression Results and Conclusion

  • The differential expression analysis reported 86.4% of the total genes are differentially expressed,  displaying a massive response to drought.  Out of this, a total of 12,375 genes (41.39% of total differentially expressed) show differential expression based on log2-fold change greater than 1 and FDR-adjusted q value of less than 0.05.

  • With higher statistical threshold cut off (at least a log2-fold change of greater than 2) and with same FDR-adjusted q-value of less than 0.05, there are a total of 5159 genes (17.25% of total differentially expressed) that are significantly affected by the drought stress at this cut off.

  • Though both tissues (leaves and roots samples) exhibit a widespread response to the drought, we observed that the root samples show a larger number of differentially expressed genes compared to the leaves samples.

  • We also observed that roots undergo more down-regulation than up-regulation of genes in response to drought for efficiently triaging and distributing natural resources above ground.

  • We observed that only 177 genes, out of a total of 5,159 significantly differentially expressed genes based on log2 fold change 2 or above, in leaves and roots, are annotated with enzymes that have homology to known biological functions. Therefore, a large number of differentially expressed genes are still completely uncharacterized.

  • Based on the functionally annotated genes, we found 24 genes are differentially expressed in both leaves and roots. There are 23 genes that are differentially expressed only in leaves and 105 genes differentially expressed only in roots.

Metabolic roles of the expressed genes

Mapping of the gene expression data to individual reactions in a model in the next section provides information on the functionally implicated pathways and associated metabolic reactions and more specifically about the metabolic roles of differentially expressed genes.

For example, the differentially expressed gene SBIRTX430.09G168000, SBIRTX430.03G383900 upregulated in roots and leaves, annotated as Gamma-glutamyl phosphate reductase (EC 1.2.1.41) / Glutamate 5-kinase (EC 2.7.2.11) in cytosol and plastid is involved in Proline metabolism. Another example is SBIRTX430.07G059900, downregulated in both leaves and roots, annotated as Chalcone synthase (EC 2.3.1.74), in nucleus and endoplasm is involved in lignin biosynthesis.

Metabolic Reconstruction

This step reconstructs genome-scale metabolic networks of plant primary metabolism using RTx430V2.1_annotated genome that was annotated with plant enzymes using OrthoFinder earlier.

Input:

  • SbiRTx430v2.1_annotated

Output:

  • SbiRTx430v2.1_reconstructed

Analysis

  • It generates the table of compartmentalized reactions in the metabolic reconstruction that can be downloaded.
  • Metabolic reconstructions provide information that is not available at the locus level, linking multiple isoforms and enzyme subunits to the reactions that they perform.
  • By integrating the RNA-seq data with a metabolic reconstruction, researchers can quickly find targets for engineering plant metabolism to improve growth during drought.
Reconstruct the metabolic network of a plant based on an annotated genome.
This app completed without errors in 5m 55s.
Objects
Created Object Name Type Description
SbiRTx430v2.1_reconstructed FBAModel FBAModel: SbiRTx430v2.1_reconstructed
Links
Output from Reconstruct Plant Metabolism
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/101788

Integration of Abundance with Metabolism

This step integrates the normalized expression matrix generated by StringTie, with the reconstructed FBA model generated by the previous step to generate the reaction matrix.

Input:

  • RTx430_sampleset_TPM_ExpressionMatrix_average
  • SbiRTx430v2.1_reconstructed

  • Expression Conditions

    • Leaves drought vs leaves control
    • Roots drought vs roots control

Output:

  • Reaction Matrix, containing a set of computed reaction expression scores, for each of the same experimental conditions.

Analysis

  • The app generally applies the "best" gene abundance to a reaction, where there are multiple genes, and where there are genes encoding protein complexes. The underlying concept is that a reaction is only as "active" in its catalysis as allowed by the formation of the enzymes that catalyze the reaction. In turn, the results here can be used as a proxy for the activity of different primary metabolic pathways, and how they may respond to different experimental conditions.

  • The output as reaction matrices of leaves and roots are given in Supplementary Tables S7 and S8 respectively.

Integrate gene abundances with a plant primary metabolic network
This app completed without errors in 2m 26s.
Objects
Created Object Name Type Description
SbiRTx430v2.1_leaves_reactionmatrix ReactionMatrix Reaction matrix: SbiRTx430v2.1_leaves_reactionmatrix
Links
Integrate gene abundances with a plant primary metabolic network
This app completed without errors in 1m 7s.
Objects
Created Object Name Type Description
SbiRTx430v2.1_roots_reactionmatrix ReactionMatrix Reaction matrix: SbiRTx430v2.1_roots_reactionmatrix
Links

Visualization of Metabolic Pathway

This step displays the highly differentially expressed genes identified by DEseq on a global metabolic map of plant metabolism based on the gene-reaction associations in the PlantSEED reconstructed metabolic model.

Input:

  • RTx430_leaves_dr-RTx430_leaves_ww_0.05q_1fc_exp
  • RTx430_roots_dr-RTx430_roots_ww_0.05q_1fc_exp
  • SbiRTx430v2.1_reconstructed
  • Expression Conditions
    • Leaves drought vs leaves control
    • Roots drought vs roots control
  • FullPlantMap

Output:

  • Visualization of highly differentially expressed reactions (shown in green) on a global plant metabolism map.

Analysis

  • This app shows where the differentially expressed reactions are for the reactions that are included in the global plant map, which does not presently include the lignin pathway. Presently, the map does not support visualization of differential gene expression generated by DEseq, but this differential expression may be downloded as TSV, converted to CSV in excel, and then uploaded into the map viewer for visualization (this is how Figure 5 in the text was created).

An excerpted and modified version of the map is displayed in Figure 5 in the text.

Display Metabolic Pathways
This app completed without errors in 60s.
Summary
message_in_app /kb/module/work/tmp/36293079-7f79-4273-bd75-9f0e17747855
Links
Display Metabolic Pathways
This app completed without errors in 2m 9s.
Summary
message_in_app /kb/module/work/tmp/ccb58a59-62dc-4d84-8c10-eeb17328f993
Links

Metabolic Results and Conclusion

Mapping transcript abundances to a metabolic model provides an opportunity to compare metabolism between plant tissues. In the model, ten CHS synthase genes in S. bicolor RTx430 are linked to the same reaction. However, different chalcone synthase genes drive the reaction expression scores in roots and leaves. In roots, the transcript abundance is highest for an isozyme on chromosome 5; in leaves, it is highest for a copy on chromosome 7.

In addition, the set of lignin biosynthesis reactions showing drought-altered transcription patterns differs between root and leaf samples. Leaf samples have lower transcript abundances for caffeoyl-CoA 3-O-methyltransferase (CCoAOMT) in drought than in control conditions. In contrast, roots have lower transcript abundances for ferulate 5-hydroxylase (F5H) in drought conditions, potentially pointing to different changes in the levels of lignin precursors and in lignin composition between the tissues.

The genes and metabolites linked to these reactions in the model can guide the design of hypothesis tests with targeted expression and metabolite measurements.

References

  1. A.P. Arkin, R.W. Cottingham, C.S. Henry, N.L. Harris, R.L. Stevens, S. Maslov, P. Dehal, D. Ware, F. Perez, S. Canon, M.W. Sneddon, M.L. Henderson, W.J. Riehl, D. Murphy-Olson, S.Y. Chan, R.T. Kamimura, S. Kumari, M.M. Drake, T.S. Brettin, E.M. Glass, D. Chivian, D. Gunter, D.J. Weston, B.H. Allen, J. Baumohl, A.A. Best, B. Bowen, S.E. Brenner, C.C. Bun, J.-M. Chandonia, J.-M. Chia, R. Colasanti, N. Conrad, J.J. Davis, B.H. Davison, M. DeJongh, S. Devoid, E. Dietrich, I. Dubchak, J.N. Edirisinghe, G. Fang, J.P. Faria, P.M. Frybarger, W. Gerlach, M. Gerstein, A. Greiner, J. Gurtowski, H.L. Haun, F. He, R. Jain, M.P. Joachimiak, K.P. Keegan, S. Kondo, V. Kumar, M.L. Land, F. Meyer, M. Mills, P.S. Novichkov, T. Oh, G.J. Olsen, R. Olson, B. Parrello, S. Pasternak, E. Pearson, S.S. Poon, G.A. Price, S. Ramakrishnan, P. Ranjan, P.C. Ronald, M.C. Schatz, S.M.D. Seaver, M. Shukla, R.A. Sutormin, M.H. Syed, J. Thomason, N.L. Tintle, D. Wang, F. Xia, H. Yoo, S. Yoo, D. Yu, KBase: The United States Department of Energy Systems Biology Knowledgebase, Nat. Biotechnol. 36 (2018) 566–569. https://doi.org/10.1038/nbt.4163.![image.png](attachment:image.png)
  2. N. Varoquaux, B. Cole, C. Gao, G. Pierroz, C.R. Baker, D. Patel, M. Madera, T. Jeffers, J. Hollingsworth, J. Sievert, Y. Yoshinaga, J.A. Owiti, V.R. Singan, S. DeGraaf, L. Xu, M.J. Blow, M.J. Harrison, A. Visel, C. Jansson, K.K. Niyogi, R. Hutmacher, D. Coleman-Derr, R.C. O’Malley, J.W. Taylor, J. Dahlberg, J.P. Vogel, P.G. Lemaux, E. Purdom, Transcriptomic analysis of field-droughted sorghum from seedling to maturity reveals biotic and metabolic responses, Proc. Natl. Acad. Sci. U. S. A. (2019). https://doi.org/10.1073/pnas.1907500116.![image.png](attachment:image.png)
  3. D.M. Emms, S. Kelly, OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy, Genome Biol. 16 (2015) 157. https://doi.org/10.1186/s13059-015-0721-2.
  4. S. Andrews, Others, FastQC: a quality control tool for high throughput sequence data. 2010, (2017).
  5. A.M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data, Bioinformatics. 30 (2014) 2114–2120. https://doi.org/10.1093/bioinformatics/btu170.
  6. M. Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads, EMBnet.journal. 17 (2011) 10–12. https://doi.org/10.14806/ej.17.1.200.
  7. C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D.R. Kelley, H. Pimentel, S.L. Salzberg, J.L. Rinn, L. Pachter, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Nat. Protoc. 7 (2012) 562–578. https://doi.org/10.1038/nprot.2012.016.
  8. M. Pertea, D. Kim, G.M. Pertea, J.T. Leek, S.L. Salzberg, Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown, Nat. Protoc. 11 (2016) 1650–1667. https://doi.org/10.1038/nprot.2016.095.
  9. D. Kim, B. Langmead, S.L. Salzberg, HISAT: a fast spliced aligner with low memory requirements, Nat. Methods. 12 (2015) 357–360. https://doi.org/10.1038/nmeth.3317.
  10. D. Kim, J.M. Paggi, C. Park, C. Bennett, S.L. Salzberg, Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype, Nat. Biotechnol. 37 (2019) 907–915. https://doi.org/10.1038/s41587-019-0201-4.
  11. M. Pertea, G.M. Pertea, C.M. Antonescu, T.-C. Chang, J.T. Mendell, S.L. Salzberg, StringTie enables improved reconstruction of a transcriptome from RNA-seq reads, Nat. Biotechnol. 33 (2015) 290–295. https://doi.org/10.1038/nbt.3122.
  12. S. Anders, W. Huber, Differential expression analysis for sequence count data, Genome Biology. 11 (2010). https://doi.org/10.1186/gb-2010-11-10-r106.
  13. M. Love, S. Anders, W. Huber, Differential analysis of RNA-Seq data at the gene level using the DESeq2 package, Heidelberg: European Molecular Biology Laboratory (EMBL). (2013). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.477.9514&rep=rep1&type=pdf.
  14. M.I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol. 15 (2014) 550. https://doi.org/10.1186/s13059-014-0550-8.
  15. S.M.D. Seaver, C. Lerma-Ortiz, N. Conrad, A. Mikaili, A. Sreedasyam, A.D. Hanson, C.S. Henry, PlantSEED enables automated annotation and reconstruction of plant primary metabolism with improved compartmentalization and comparative consistency, Plant J. 95 (2018) 1102–1113. https://doi.org/10.1111/tpj.14003.
  16. Z.A. King, A. Dräger, A. Ebrahim, N. Sonnenschein, N.E. Lewis, B.O. Palsson, Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways, PLoS Comput. Biol. 11 (2015) e1004321. https://doi.org/10.1371/journal.pcbi.1004321.

Released Apps

  1. Align Reads using HISAT2 - v2.1.0
    • Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nature Methods. 2015;12: 357 360. doi:10.1038/nmeth.3317
    • Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology. 2013;14: R36. doi:10.1186/gb-2013-14-4-r36
  2. Annotate Plant Enzymes with OrthoFinder
    • Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 2015;16. doi:10.1186/s13059-015-0721-2
    • OrthoFinder GitHub source:
    • PlantSEED Github source:
  3. Assemble Transcripts using StringTie - v2.1.5
    • Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol. 2015;33: 243 246. doi:10.1038/nbt.3172
    • https://www.nature.com/articles/nmeth.3317
    • Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology. 2013;14: R36. doi:10.1186/gb-2013-14-4-r36
    • Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology. 2015;33: 290 295. doi:10.1038/nbt.3122
    • Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25: 1105 1111. doi:10.1093/bioinformatics/btp120
    • Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7: 562 578. doi:10.1038/nprot.2012.016
  4. Assess Read Quality with FastQC - v0.11.9
    • FastQC source: Bioinformatics Group at the Babraham Institute, UK.
  5. Create Average ExpressionMatrix
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  6. Create Differential Expression Matrix using DESeq2 - v1.20.0
    • Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15: 550. doi:10.1186/s13059-014-0550-8
  7. Create RNA-seq SampleSet
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  8. Create Up/Down Regulated FeatureSet and ExpressionMatrix
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  9. Import GFF3/FASTA file as Genome from Staging Area
    no citations
  10. Import SRA File as Reads From Web - v1.0.7
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  11. Integrate Abundances with Metabolism
    • [1] Seaver SMD, Bradbury LM, Frelin O, Zarecki R, Ruppin E, Hanson AD, Henry CS Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm. Front Plant Sci. 2015;6: 142. doi: 10.3389/fpls.2015.00142
  12. Reconstruct Plant Metabolism
    • [1] Seaver SMD, Lerma-Ortiz C, Conrad N, Mikaili A, Sreedasyam A, Hanson AD, et al. PlantSEED enables automated annotation and reconstruction of plant primary metabolism with improved compartmentalization and comparative consistency. Plant J. 2018;95: 1102 1113. doi:10.1111/tpj.14003
    • [2] Seaver SMD, Gerdes S, Frelin O, Lerma-Ortiz C, Bradbury LMT, Zallot R, et al. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource. Proc Natl Acad Sci USA. 2014;111: 9645 9650. doi:10.1073/pnas.1401329111
    • [3] GitHub source:
    • [4] Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 2015;16. doi:10.1186/s13059-015-0721-2
    • [5] Henry CS, DeJongh M, Best AA, Frybarger PM, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol. 2010;28: 977 982. doi:10.1038/nbt.1672
    • [6] Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42: D206 D214. doi:10.1093/nar/gkt1226
    • [7] Latendresse M. Efficiently gap-filling reaction networks. BMC Bioinformatics. 2014;15: 225. doi:10.1186/1471-2105-15-225
    • [8] Dreyfuss JM, Zucker JD, Hood HM, Ocasio LR, Sachs MS, Galagan JE. Reconstruction and Validation of a Genome-Scale Metabolic Model for the Filamentous Fungus Neurospora crassa Using FARM. PLOS Computational Biology. 2013;9: e1003126. doi:10.1371/journal.pcbi.1003126
    • [9] Mahadevan R, Schilling CH. The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. Metab Eng. 2003;5: 264 276.

Apps in Beta

  1. Escher Pathway Viewer
    • [1] King, Z. A., Dr ger, A., Ebrahim, A., Sonnenschein, N., Lewis, N. E., & Palsson, B. . (2015). Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. PLOS Computational Biology, 11(8), e1004321.
    • [2] Rowe, E., Palsson, B. ., & King, Z. A. (2018). Escher-FBA: a web application for interactive flux balance analysis. BMC Systems Biology, 12(1), 84.