A KBase Case Study on Genome-wide Transcriptomics and Plant Primary Metabolism in response to Drought Stress in Sorghum.¶

Abstract¶

A better understanding of the genetic and metabolic mechanisms that confer stress resistance and tolerance in plants is key to engineering new crops through advanced breeding technologies. This requires a systems biology approach that builds on a genome-wide understanding of the regulation of gene expression, plant metabolism, physiology and growth. In this study, we examine the response to drought stress in Sorghum, as we leverage the tools for transcriptomics and plant metabolic modeling we have implemented at the U.S. Department of Energy Systems Biology Knowledgebase (KBase). KBase enables researchers worldwide to collaborate and advance research by allowing them to upload private or public data into the KBase Narrative Interface, empowering them to analyze this data using a rich, extensible array of computational and data-analytics tools, and allows them to securely share scientific workflows and conclusions. We demonstrate how to use the current RNA-seq tools in KBase, applicable to both plants and microbes, to assemble and quantify long transcripts and identify differentially expressed genes effectively. More specifically, we demonstrate the utility of the platform by identifying key genes that are differentially expressed during drought-stress in Sorghum bicolor, which is an important sustainable production crop plant. We then show how to use KBase tools to predict the membership of genes in metabolic pathways and examine expression data in the context of metabolic subsystems. We demonstrate the power of the platform by making the data analysis and interpretation available to the biologists in the reproducible, re-usable, point-and-click format of a KBase Narrative thus promoting FAIR (Findable, Accessible, Interoperable and Reusable) guiding principles for scientific data management and stewardship.

Case study as an example to run the KBase workflow¶

This narrative demonstrates the gene expression profiling and its downstream metabolic mapping analysis in KBase [1] using a subset of the published data set [2] as a use case to investigate the genes and pathways in leaves and roots of Sorghum that are differentially impacted by drought stress.

This narrative targets the short reads dataset from leaf and root samples of a published study [2] of the Sorghum RTx430 genotype after eight weeks of growth under water limitation to simulate drought conditions in the field study. For gene expression profiling, the recently released RTx430 genome from JGI Phytozome is used to guide the transcript assembly [Sorghum bicolor RTx430 v2.1, DOE-JGI, http://phytozome.jgi.doe.gov/]. This genome does not have any functional annotation, therefore, we decided to annotate the genome with plant enzymes prior to running the RNA-seq analysis so that the differentially expressed genes can be characterized based on the currently available enzyme information. Next, the annotated genome is used to reconstruct Sorghum primary metabolism. Finally, the expression abundance is integrated with the annotated enzymes in the metabolic reconstruction to identify and visualize primary metabolic pathways of interest.

Data Description¶

This narrative uses a subset of the Sorghum RTx430 reads dataset published by a drought study, EPICON (Epigenetic Control of Drought Response in Sorghum bicolor), funded by the US Department of Energy [2]. Sorghum plant samples of genotype RTx430 were subjected to drought stress and samples were collected 56 days after sowing in the field. From this field experiment, 3rd and 4th leaves and the top 30 cm of the root systems from 10 plants per plot for each sample were taken. The plants were grown in a field (Parlier, CA, USA; 36.6008°N, 119.5109°W) with sandy loam soil containing silky substratum (pH 7.37). Field plots were watered before planting; weekly irrigation started from day 16 [2].The complete data set is available at NCBI and can be accessed by using Project/GEO/SRA accession numbers [Project: PRJNA527782; GEO: GSE128441; SRA: SRP188707]. The published study [2] used the Sorghum reference genome BTx623 for reference guided gene profiling of the raw sequencing reads obtained from Sorghum RTx430 samples.

For this case study, only the data set of week 8 time point (56 days after sowing in the field) from Sorghum genotype RTx430 is selected that includes sequencing reads obtained from leaf and root tissues under two conditions, well-watered (ww) as control and drought-stressed (dr) as pre flowering drought treatment with three replicates per condition. Therefore, a total 12 samples are used for transcriptome analysis (Table S2).

Workflow steps¶

The KBase gene expression profiling and metabolic mapping with primary metabolism workflow is organized in three modules:

Import and preparation of genome and sequencing reads
Alignment, assembly and quantification
Metabolic reconstruction, integration and visualization of abundance data

Figure 1. A schematic overview of KBase gene expression profiling and metabolic mapping with primary metabolism workflow organized in three modules: Module 1 - import and preparation of genome and sequencing reads; Module 2 - Alignment, assembly and quantification; Module 3 - Metabolic reconstruction, integration and visualization of abundance data.

Module 1 starts with the import of the reference genome and annotation of the reference genome with enzymatic functions of plant primary metabolism using Orthofinder [3]. Next step is import of raw sequencing reads, reads processing using quality check tools such as FastQC [4] and various reads filtering tools such as Trimmomatic [5] and CutAdapt [6].

Module 2 focuses on reference genome guided gene expression profiling based on Tuxedo suite [7,8] and allows users to map mRNA short reads to the annotated genome using HISAT2 [9,10], assemble the transcripts using StringTie [8,11], and identify the counts-based differential gene-expression between drought and well-water control conditions using DESeq2 [12-14]. Note that while gene expression profiling in KBase can be done on any reference genome, here, the PlantSEED/Orthofinder annotation App is run on the genome prior to running the RNA-seq analysis so that the differentially expressed genes can be evaluated in a metabolic context.

Module 3 supports metabolic mapping capabilities and starts with the construction of a primary metabolism model using the annotated genome to identify chemical reactions curated for the metabolic network [15]. The expression abundance generated via Module 2 is readily interoperable with the Module 3 metabolic mapping to characterize the primary metabolic reactions specific to the experimental treatment. Finally, metabolic pathways can be visualized by using Escher Pathway Viewer [16] App, that is adapted in KBase for visualizing metabolic models and related data.

Here are the specific steps to run this use case in this Narrative.

Import of Sorghum RTx430 Genome
This step imports S. bicolor RTx430 genome as a reference genome serving as the basis for RNA-Seq alignment (S. Bicolor RTx430 genome). The genome was taken from JGI Phytozome V13.
Annotation of Sorghum RTx430 Genome with Metabolic Enzymes
This step annotates the (S. bicolor RTx430 genome) with metabolic enzymes based on PlantSEED models using "Annotate Plant Enzymes with OrthoFinder" App. It results into the annotated genome as an output object "SbiRTx430v2.1_annotated", as given in the Data Pane (Table S3).
Import of 12 SingleEndLibrary Reads from NCBI
This step imports 12 SRA files as SingleEndLibrary Objects from NCBI in the Data Pane using "Import of SRA reads from Web" App.
Group of 12 SingleEndLibrary objects as SampleSet
This step groups all the 12 imported reads SingleEndLibrary Objects based on the condition and tissue type and creates a Sample Set "RTx430_sampleset". This is used as an input for FastQC, HISAT2 Alignment and DESeq2 differential expression to simplify the process.
Quality Assessment of SampleSet using FastQC
This step provides the reads quality assessment for each individual sample of "RTx430_sampleset."
Align Reads of SampleSet using HISAT2
This step aligns each SRA reads for individual sample from the the SampleSet "RTx430_sampleset" to the SbiRTx430 v2.1 annotated reference genome using HISAT2 and provides position-level coverage for each sample. Additionally, it generates a merged alignment set object labelled as "RTx430_sampleset_alignment_set" in the Narrative, which can be run in batch mode in the next step of StringTie.
Assemble Transcripts using StringTie
This step generates gene expression abundance for each sample and an expression set object “RTx430_sampleset_gene_expression_set” that wraps multiple sample results into a single expression set for downstream analysis. In addition, it also provides the normalized expression matrix for all samples in both FPKM and TPM formats for easy download from data pane. This App also generates interactive histogram plots for expression abundance in FPKM and TPM. The normalized expression matrix "RTx430_sampleset_TPM_ExpressionMatrix" is also available as a supplementary file for those interested in comparing relative expression across multiple samples (Table S5).
Create Average Expression Matrix
In the next step, to get the average abundances for each gene in each condition averaged across the biological replicates “Create Average Expression Matrix” App is run on the normalized expression matrix “RTx430_sampleset_TPM_ExpressionMatrix”, resulting in “RTx430_sampleset_TPM_ExpressionMatrix_average” expression matrix as an output. This average expression matrix is later used in the workflow to assign reaction level expression scores to study plant primary metabolism.
Differential Expression using DESeq2
This step calculates gene-annotation expression-level differences by condition using DESeq2. This step calculates the all-by-all differential expression matrix. “Create Differential Expression Matrix using DESeq2” App is run on the multi-sample expression set “RTx430_sampleset_gene_expression_set” resulting in a “DifferentialExpressionMatrixSet” object “RTx430_DESeq2_all”. This object groups a set the six “DifferentialExpressionMatrix” objects that represent all possible pairwise combinations of four treatments. The complete lists of differential expression analysis of leaf and root samples of SbiRTx430 genome under drought stress are provided as Table S6 and Table S7 respectively.
Create up/down Regulated Feature Set and Differential Expression Matrix
This step runs “Create up/down regulated FeatureSet and ExpressionMatrix” App to get the significant differential expression of up and down regulated genes across all samples with normalized expression “RTx430_sampleset_TPM_ExpressionMatrix_average” and differential expression “RTx430_DESeq2_all” as input. This App selectes genes that exhibit differential expression (either upregulated or downregulated) for drought vs well-watered conditions based on the given statistical threshold of FDR-adjusted p-value (alpha cut off) and log2 fold change. This App is run separately for leaf and root tissues based on the adjusted p-value below 0.05 and absolute log2 fold change greater than or equal to 1 and 2 respectively.
Metabolic Reconstruction
This step reconstructs genome-scale metabolic networks of plant primary metabolism using RTx430 genome that has previously been annotated with plant enzymes using OrthoFinder. It gives "SbiRTx430v2.1_reconstructed" object as an output in the Data pane.
Integration of Abundance with Metabolism
This step integrates the normalized expression matrix generated by StringTie “RTx430_sampleset_TPM_ExpressionMatrix_average”, with the reconstructed FBA model "SbiRTx430v2.1_reconstructed" generated by the previous step to generate the reaction matrix. The complete list of reaction matrices generated by this study for each experimental condition (drought and well-watered as control) in leaves and roots are given as supplementary tables S8 and S9 respectively.
Visualization of Metabolic Pathways
This step uses “Escher Pathway Viewer” app to displays the highly differentially expressed genes identified by DESeq2 on a global metabolic map of plant metabolism based on the gene-reaction associations in the PlantSEED reconstructed metabolic model.

Import of Sorghum RTx430 Genome¶

Sorghum bicolor RTx430 (v2.1) genome was imported directly from the JGI Phytozome site (S. bicolor RTx430 v2.1, DOE-JGI, http://phytozome.jgi.doe.gov/) through globus into KBase staging area and includes the JGI v2.0 assembly of Sorghum bicolor and the JGI v2.1 annotation as “SbicolorRTx430_552_v2.fa.gz” and “SbicolorRTx430_552_v2.1_gene.gff3.gz” files respectively.

App - Import GFF3/FASTA file as Genome from Staging Area¶

Inputs:¶

“SbicolorRTx430_552_v2.fa.gz” - JGI v2.0 assembly of Sorghum bicolor
“SbicolorRTx430_552_v2.1_gene.gff3.gz” - JGI v2.1 annotation

Output:¶

The FASTA and GFF annotations are imported into the RTx430 Transcriptomics Analysis Narrative as the genome object called “SbicolorRTx430v2.1”

Analysis:¶

This genome has 34,601 protein-coding genes and 46,881 protein-coding transcripts. This genome object was next annotated for functional information and is used to align the reads during HISAT2 alignment and onwards.

Note, this particular genome has no functional annotation information following import because the GFF file that was imported included no functional annotations (only structural annotations). If a GFF file does contain functional annotations, those would have been added to the genome during import. The next step of the workflow is to functionally annotate this genome in KBase before it is subjected to RNA-seq analysis.

Annotation of the Sorghum RTx430 Genome with Metabolic Enzymes¶

SbiRTx430v2.1 genome is annotated with metabolic enzymes using PlantSEED models using "Annotate Plant Enzymes with OrthoFinder" App.

Input:¶

SbiRTx430v2.1 genome object

Output:¶

SbiRTx430v2.1_annotated

Analysis¶

This App provides the table as output with all predicted enzymes in the KBase Genome and the curated Arabidopsis enzymes in the PlantSEED database. The App also shows a figure and a table as part of its report. The table is easily downloaded in CSV format and allows the user to find which genes were annotated with which enzymatic function.
From this table, for the RTx430v2.1 genome, the Plant OrthoFinder app clustered 1,766 protein sequences with 1,419 PlantSEED-curated genes.
This App provides the “SbiRTx430v2.1_annotated” as an output in Data Pane. It provides the table as output with all predicted enzymes in the KBase Genome and the curated Arabidopsis enzymes in the PlantSEED database (Table S3).

Import of 12 SingleEndLibrary Reads from NCBI¶

Input:¶

The SRA read files of leaves and roots of Sorghum RTx430 genotype at 56 days timepoint under control and pre-flowering drought conditions with three replicates per condition are imported by using the SRA accession number given in Table S2 from NCBI SRA site using “Direct Download Link”. The SRA link for each SRA read file and object name is given as input to “Import SRA Reads from Web” App.

Table S2. SRA read files of leaves and roots of RTx430 at 56 days timepoint under control and preflowering drought conditions

Output:¶

Each SRA file is imported to KBase narrative as a SingleEndLibrary object with the reads statistics as an output.
The output of this App for each sample provides an overview of the reads metadata and its statistics summary such as number of reads, mean read length, number of duplicate reads, mean quality score (Phred scale), and total number of bases.

Analysis¶

For this case study, we use data sets corresponding to the time point of week 8 (56 days from sowing) from Sorghum genotype RTx430 that includes 12 single end read files obtained from two conditions, well-watered (control) and drought-stressed (dr treatment) with three replicates per condition in leaves and root tissue. Therefore, a total 12 samples are used for transcriptome analysis.

Group of 12 SingleEndLibrary Objects as SampleSet¶

This step groups Reads (SingleEndLibraries objects) to RNASeqSampleSet.

Input:¶

Three replicates for each condition, well-watered as control (ww) and drought treatment (dr) in each tissue (leaves and roots) are first grouped into respective sample label e.g. RTx430_leaves_ww, RTx430_leaves_dr, RTx430_roots_ww, and RTx430_roots_dr as input parameters.
These four groups, along with experimental metadata such as SampleSet Description, Platform, Library Type, Domain, Source and Publication Details constitute the SampleSet object.

Output:¶

RTx430_sampleset

Analysis¶

The merging of all samples as a set "RTx430_sampleset" makes it possible for users to run downstream operations (such as FASTQC, HISAT2, and StringTie) on all samples in a batch mode.
The App also associates experimental metadata that makes it easier to run differential expression based on the experimental conditions in the downstream process.

Quality assessment of SampleSet using FastQC¶

Quality assessment of the reads of the "RTx430_sampleset" object using FastQC. This step provides reads quality assessment for each sample.

Input:¶

RTx430_sampleset

Output:¶

FastQC generates an output of a comprehensive multi-page report on the composition and quality of reads in HTML format, with one page for each of the reads (e.g. Single End, Paired End: forward, Paired End: reverse). The report can be viewed inside the Narrative or as a new web page that can also be downloaded.
The HTML report includes results from multiple modules that were run by FastQC, and provides a quick assessment of the quality of the results labeled as normal (green checkmark), slightly abnormal (orange triangle), and very unusual (red cross) reads.

Analysis¶

Based on the summary of the FastQC report, the reads data is found to be of good quality, hence there was no need to trim the reads.

Align Reads of SampleSet using HISAT2¶

Align SRA reads in the RTx430_sampleset object to the SbiRTx430 v2.1 annotated reference genome using HISAT2 by selecting all the default parameters.

Input:¶

RTx430_sampleset object
SbiRTx430 v2.1 annotated reference genome

Output:¶

12 BAM alignment objects for each individual sample and merged alignment set as "RTx430_sampleset_alignmentset"
QualiMap report

Analysis¶

Clicking on the "RTx430_sampleset_alignmentset" under the "Result" tab provides the alignment statistics such as total reads, unmapped and mapped reads, multiple alignments, and singletons in the table format for each individual alignment object for each sample.
The alignment results show a very high percentage of the reads’ mapping (between 91.33% to 99.1%).
Another interesting observation is that a very low percentage of reads (between 2.40% to 3.92%) mapped to multiple locations in the RTx430 genome.
QualiMap generates a comprehensive HTML report as well on the quality of the BAM alignments. The mean mapping quality of all the samples is 58.33.
PCA plot shows that samples can be grouped based on tissue.
The BAM alignment objects can be downloaded to visualize the aligned reads outside KBase using JBrowse and Integrative Genomics Viewer (IGV).

Assemble Transcripts using StringTie¶

Assemble RNA-seq alignments into transcripts with “Assemble Transcripts using StringTie'' App using "RTx430_sampleset_alignment_set'' object as an input and default values for all other parameters. In KBase, the StringTie App is configured to use known transcript models (following the reference annotation guided assembly process) for the expression analysis.

Input:¶

RTx430_sampleset_alignmentset

Output:¶

12 Expression objects for each individual sample and a merged expression set as RTx430_sampleset_expression_set
Normalized expression matrix in FPKM units as RTx430_sampleset_FPKM_ExpressionMatrix
Normalized expression matrix in TPM units as RTx430_sampleset_TPM_ExpressionMatrix

Analysis¶

The StringTie App generates gene expression abundance for each sample and an expression set object “RTx430_sampleset_gene_expression_set” that wraps multiple sample results into a single expression set for downstream analysis.
Clicking on the "RTx430_sampleset_expression_set" under Results tab generates the interactive histogram plots for expression abundance in FPKM and TPM.
In addition, it also provides the normalized expression matrix for all samples in both FPKM and TPM formats for easy download. We have provided the normalized expression matrix in TPM as a supplementary file for those interested in comparing relative expression across multiple samples (Table S5).

Create Average Expression Matrix¶

To find the average abundances for each gene in each condition, the normalized expression matrix is averaged across the biological replicates for each condition.

Input:¶

RTx430_sampleset_TPM_ExpressionMatrix

Output:¶

RTx430_sampleset_TPM_ExpressionMatrix_average

Analysis¶

This average expression matrix “RTx430_sampleset_TPM_ExpressionMatrix_average” is used in a later step to assign reaction level expression scores to study plant primary metabolism.

Differential expression using DESeq2¶

Perform differential expression calculations using DESeq2. This step calculates gene-annotation expression-level differences by condition.

Input:¶

RTx430_sampleset_expression_set

Output:¶

RTx430_DESeq2_all

Analysis¶

This step generates the six “DifferentialExpressionMatrix” objects which represent all possible pairwise combinations of four treatments and groups these six objects in a set object "RTx430_DESEq2_all".
Clicking on the DifferentialExpressionMatrixSet object "RTx430_DESeq2_all" displays the set viewer and provides interactive volcano plot for each sample. Volcano plot allows the visual identification of the significant genes based on the given statistical threshold cut off of significance -log10 P value along the y-axis and log2 fold change along the x-axis. In a volcano plot, the genes that pass the P value and fold change thresholds, the most upregulated genes are displayed towards the right, the most downregulated genes are towards the left, and the most statistically significant genes are towards the top. On the given threshold cut off, it also provides the list of genes with p- value, q-value, Significance (-log10) and Fold change (log2) under Gene Table tab that can be exported.
In addition, DESeq2 generates the Dispersion plot and PCA plot to help understand the inter-sample expression variability.
The dispersion plot can be seen as a scatter plot with log2 fold change along the y-axis and normalized mean expression along the x-axis. It is essentially a measure of expression variance for a given mean expression for all genes. As expected, the variability in fold changes is relatively higher for lowly expressed genes.
Based on the PCA plot, the first principal component (PC1) accounts for 91% variation among the samples based on tissue (leaf vs root). The second principal component (PC2) accounts for a minor variance of only 6% among the samples based on the treatment (drought vs control).
The complete lists of differential expression analysis of leaf and root samples of SbiRTx430 genome under drought stress are provided as Table S6 and Table S7 respectively.

Create up/down Regulated Feature Set and Differential Expression Matrix¶

“Create up/down regulated FeatureSet and ExpressionMatrix” App is used to get the significant differential expression of up and down regulated genes and differential expression matrix based on different fold levels.

We ran this app two times to get the differential expression genes and differential expression matrix of “selected pairwise conditions” of leaves drought versus well-watered control and roots drought versus well-watered control based on the adjusted P-value below 0.05 and log2 fold change above 1 and 2 respectively.

Input:¶

RTx430_DESeq2_all
RTx430_sampleset_TPM_ExpressionMatrix_average
Alpha cut off (0.05)
Log2Fold change
Specific Pairwise Conditions
- Leaves drought vs leaves control
- Roots drought vs roots control

Output:¶

Upregulated - Two Upper FeatureSet
Downregulated - Two Lower FeatureSet
Two Differential expression matrix

Analysis¶

This App selectes genes that exhibit differential expression (either upregulated or downregulated) for drought vs well-watered conditions based on the given statistical threshold of FDR-adjusted p-value (alpha cut off) and log2 fold change. This App is run separately for leaf and root tissues based on the adjusted p-value below 0.05 and absolute log2 fold change greater than or equal to 1 and 2 respectively.

Differential Expression Results and Conclusion¶

The differential expression analysis reported 86.4% of the total genes are differentially expressed, displaying a massive response to drought. Out of this, a total of 12,375 genes (41.39% of total differentially expressed) show differential expression based on log2-fold change greater than 1 and FDR-adjusted q value of less than 0.05.
With higher statistical threshold cut off (at least a log2-fold change of greater than 2) and with same FDR-adjusted q-value of less than 0.05, there are a total of 5159 genes (17.25% of total differentially expressed) that are significantly affected by the drought stress at this cut off.
Though both tissues (leaves and roots samples) exhibit a widespread response to the drought, we observed that the root samples show a larger number of differentially expressed genes compared to the leaves samples.
We also observed that roots undergo more down-regulation than up-regulation of genes in response to drought for efficiently triaging and distributing natural resources above ground.
We observed that only 177 genes, out of a total of 5,159 significantly differentially expressed genes based on log2 fold change 2 or above, in leaves and roots, are annotated with enzymes that have homology to known biological functions. Therefore, a large number of differentially expressed genes are still completely uncharacterized.
Based on the functionally annotated genes, we found 24 genes are differentially expressed in both leaves and roots. There are 23 genes that are differentially expressed only in leaves and 105 genes differentially expressed only in roots.

Metabolic roles of the expressed genes¶

Mapping of the gene expression data to individual reactions in a model in the next section provides information on the functionally implicated pathways and associated metabolic reactions and more specifically about the metabolic roles of differentially expressed genes.

For example, the differentially expressed gene SBIRTX430.09G168000, SBIRTX430.03G383900 upregulated in roots and leaves, annotated as Gamma-glutamyl phosphate reductase (EC 1.2.1.41) / Glutamate 5-kinase (EC 2.7.2.11) in cytosol and plastid is involved in Proline metabolism. Another example is SBIRTX430.07G059900, downregulated in both leaves and roots, annotated as Chalcone synthase (EC 2.3.1.74), in nucleus and endoplasm is involved in lignin biosynthesis.

Metabolic Reconstruction¶

This step reconstructs genome-scale metabolic networks of plant primary metabolism using RTx430V2.1_annotated genome that was annotated with plant enzymes using OrthoFinder earlier.

Input:¶

SbiRTx430v2.1_annotated

Output:¶

SbiRTx430v2.1_reconstructed

Analysis¶

It generates the table of compartmentalized reactions in the metabolic reconstruction that can be downloaded.
Metabolic reconstructions provide information that is not available at the locus level, linking multiple isoforms and enzyme subunits to the reactions that they perform.
By integrating the RNA-seq data with a metabolic reconstruction, researchers can quickly find targets for engineering plant metabolism to improve growth during drought.

Integration of Abundance with Metabolism¶

This step integrates the normalized expression matrix generated by StringTie, with the reconstructed FBA model generated by the previous step to generate the reaction matrix.

Input:¶

RTx430_sampleset_TPM_ExpressionMatrix_average
SbiRTx430v2.1_reconstructed
Expression Conditions
- Leaves drought vs leaves control
- Roots drought vs roots control

Output:¶

Reaction Matrix, containing a set of computed reaction expression scores, for each of the same experimental conditions.

Analysis¶

The app generally applies the "best" gene abundance to a reaction, where there are multiple genes, and where there are genes encoding protein complexes. The underlying concept is that a reaction is only as "active" in its catalysis as allowed by the formation of the enzymes that catalyze the reaction. In turn, the results here can be used as a proxy for the activity of different primary metabolic pathways, and how they may respond to different experimental conditions.
The output as reaction matrices of leaves and roots are given in Supplementary Tables S7 and S8 respectively.

Visualization of Metabolic Pathway¶

This step displays the highly differentially expressed genes identified by DEseq on a global metabolic map of plant metabolism based on the gene-reaction associations in the PlantSEED reconstructed metabolic model.

Input:¶

RTx430_leaves_dr-RTx430_leaves_ww_0.05q_1fc_exp
RTx430_roots_dr-RTx430_roots_ww_0.05q_1fc_exp
SbiRTx430v2.1_reconstructed
Expression Conditions
- Leaves drought vs leaves control
- Roots drought vs roots control
FullPlantMap

Output:¶

Visualization of highly differentially expressed reactions (shown in green) on a global plant metabolism map.

Analysis¶

This app shows where the differentially expressed reactions are for the reactions that are included in the global plant map, which does not presently include the lignin pathway. Presently, the map does not support visualization of differential gene expression generated by DEseq, but this differential expression may be downloded as TSV, converted to CSV in excel, and then uploaded into the map viewer for visualization (this is how Figure 5 in the text was created).

An excerpted and modified version of the map is displayed in Figure 5 in the text.

Metabolic Results and Conclusion¶

Mapping transcript abundances to a metabolic model provides an opportunity to compare metabolism between plant tissues. In the model, ten CHS synthase genes in S. bicolor RTx430 are linked to the same reaction. However, different chalcone synthase genes drive the reaction expression scores in roots and leaves. In roots, the transcript abundance is highest for an isozyme on chromosome 5; in leaves, it is highest for a copy on chromosome 7.

In addition, the set of lignin biosynthesis reactions showing drought-altered transcription patterns differs between root and leaf samples. Leaf samples have lower transcript abundances for caffeoyl-CoA 3-O-methyltransferase (CCoAOMT) in drought than in control conditions. In contrast, roots have lower transcript abundances for ferulate 5-hydroxylase (F5H) in drought conditions, potentially pointing to different changes in the levels of lignin precursors and in lignin composition between the tissues.

The genes and metabolites linked to these reactions in the model can guide the design of hypothesis tests with targeted expression and metabolite measurements.

References¶

A.P. Arkin, R.W. Cottingham, C.S. Henry, N.L. Harris, R.L. Stevens, S. Maslov, P. Dehal, D. Ware, F. Perez, S. Canon, M.W. Sneddon, M.L. Henderson, W.J. Riehl, D. Murphy-Olson, S.Y. Chan, R.T. Kamimura, S. Kumari, M.M. Drake, T.S. Brettin, E.M. Glass, D. Chivian, D. Gunter, D.J. Weston, B.H. Allen, J. Baumohl, A.A. Best, B. Bowen, S.E. Brenner, C.C. Bun, J.-M. Chandonia, J.-M. Chia, R. Colasanti, N. Conrad, J.J. Davis, B.H. Davison, M. DeJongh, S. Devoid, E. Dietrich, I. Dubchak, J.N. Edirisinghe, G. Fang, J.P. Faria, P.M. Frybarger, W. Gerlach, M. Gerstein, A. Greiner, J. Gurtowski, H.L. Haun, F. He, R. Jain, M.P. Joachimiak, K.P. Keegan, S. Kondo, V. Kumar, M.L. Land, F. Meyer, M. Mills, P.S. Novichkov, T. Oh, G.J. Olsen, R. Olson, B. Parrello, S. Pasternak, E. Pearson, S.S. Poon, G.A. Price, S. Ramakrishnan, P. Ranjan, P.C. Ronald, M.C. Schatz, S.M.D. Seaver, M. Shukla, R.A. Sutormin, M.H. Syed, J. Thomason, N.L. Tintle, D. Wang, F. Xia, H. Yoo, S. Yoo, D. Yu, KBase: The United States Department of Energy Systems Biology Knowledgebase, Nat. Biotechnol. 36 (2018) 566–569. https://doi.org/10.1038/nbt.4163.![image.png](attachment:image.png)
N. Varoquaux, B. Cole, C. Gao, G. Pierroz, C.R. Baker, D. Patel, M. Madera, T. Jeffers, J. Hollingsworth, J. Sievert, Y. Yoshinaga, J.A. Owiti, V.R. Singan, S. DeGraaf, L. Xu, M.J. Blow, M.J. Harrison, A. Visel, C. Jansson, K.K. Niyogi, R. Hutmacher, D. Coleman-Derr, R.C. O’Malley, J.W. Taylor, J. Dahlberg, J.P. Vogel, P.G. Lemaux, E. Purdom, Transcriptomic analysis of field-droughted sorghum from seedling to maturity reveals biotic and metabolic responses, Proc. Natl. Acad. Sci. U. S. A. (2019). https://doi.org/10.1073/pnas.1907500116.![image.png](attachment:image.png)
D.M. Emms, S. Kelly, OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy, Genome Biol. 16 (2015) 157. https://doi.org/10.1186/s13059-015-0721-2.
S. Andrews, Others, FastQC: a quality control tool for high throughput sequence data. 2010, (2017).
A.M. Bolger, M. Lohse, B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data, Bioinformatics. 30 (2014) 2114–2120. https://doi.org/10.1093/bioinformatics/btu170.
M. Martin, Cutadapt removes adapter sequences from high-throughput sequencing reads, EMBnet.journal. 17 (2011) 10–12. https://doi.org/10.14806/ej.17.1.200.
C. Trapnell, A. Roberts, L. Goff, G. Pertea, D. Kim, D.R. Kelley, H. Pimentel, S.L. Salzberg, J.L. Rinn, L. Pachter, Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks, Nat. Protoc. 7 (2012) 562–578. https://doi.org/10.1038/nprot.2012.016.
M. Pertea, D. Kim, G.M. Pertea, J.T. Leek, S.L. Salzberg, Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown, Nat. Protoc. 11 (2016) 1650–1667. https://doi.org/10.1038/nprot.2016.095.
D. Kim, B. Langmead, S.L. Salzberg, HISAT: a fast spliced aligner with low memory requirements, Nat. Methods. 12 (2015) 357–360. https://doi.org/10.1038/nmeth.3317.
D. Kim, J.M. Paggi, C. Park, C. Bennett, S.L. Salzberg, Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype, Nat. Biotechnol. 37 (2019) 907–915. https://doi.org/10.1038/s41587-019-0201-4.
M. Pertea, G.M. Pertea, C.M. Antonescu, T.-C. Chang, J.T. Mendell, S.L. Salzberg, StringTie enables improved reconstruction of a transcriptome from RNA-seq reads, Nat. Biotechnol. 33 (2015) 290–295. https://doi.org/10.1038/nbt.3122.
S. Anders, W. Huber, Differential expression analysis for sequence count data, Genome Biology. 11 (2010). https://doi.org/10.1186/gb-2010-11-10-r106.
M. Love, S. Anders, W. Huber, Differential analysis of RNA-Seq data at the gene level using the DESeq2 package, Heidelberg: European Molecular Biology Laboratory (EMBL). (2013). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.477.9514&rep=rep1&type=pdf.
M.I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol. 15 (2014) 550. https://doi.org/10.1186/s13059-014-0550-8.
S.M.D. Seaver, C. Lerma-Ortiz, N. Conrad, A. Mikaili, A. Sreedasyam, A.D. Hanson, C.S. Henry, PlantSEED enables automated annotation and reconstruction of plant primary metabolism with improved compartmentalization and comparative consistency, Plant J. 95 (2018) 1102–1113. https://doi.org/10.1111/tpj.14003.
Z.A. King, A. Dräger, A. Ebrahim, N. Sonnenschein, N.E. Lewis, B.O. Palsson, Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways, PLoS Comput. Biol. 11 (2015) e1004321. https://doi.org/10.1371/journal.pcbi.1004321.

Created Object Name	Type	Description
RTx430_leaves_ww_r1	SingleEndLibrary	Imported Reads
RTx430_leaves_ww_r2	SingleEndLibrary	Imported Reads
RTx430_leaves_ww_r3	SingleEndLibrary	Imported Reads
RTx430_leaves_dr_r1	SingleEndLibrary	Imported Reads
RTx430_leaves_dr_r2	SingleEndLibrary	Imported Reads
RTx430_leaves_dr_r3	SingleEndLibrary	Imported Reads

Created Object Name	Type	Description
RTx430_roots_ww_r1	SingleEndLibrary	Imported Reads
RTx430_roots_ww_r2	SingleEndLibrary	Imported Reads
RTx430_roots_ww_r3	SingleEndLibrary	Imported Reads
RTx430_roots_dr_r1	SingleEndLibrary	Imported Reads
RTx430_roots_dr_r2	SingleEndLibrary	Imported Reads
RTx430_roots_dr_r3	SingleEndLibrary	Imported Reads

Created Object Name	Type	Description
RTx430_leaves_dr_r2_alignment	RNASeqAlignment	Reads 101788/22/1;101788/11/1 aligned to Genome 101788/19/2
RTx430_leaves_ww_r1_alignment	RNASeqAlignment	Reads 101788/22/1;101788/5/1 aligned to Genome 101788/19/2
RTx430_leaves_dr_r1_alignment	RNASeqAlignment	Reads 101788/22/1;101788/10/1 aligned to Genome 101788/19/2
RTx430_roots_dr_r1_alignment	RNASeqAlignment	Reads 101788/22/1;101788/15/1 aligned to Genome 101788/19/2
RTx430_leaves_ww_r3_alignment	RNASeqAlignment	Reads 101788/22/1;101788/8/1 aligned to Genome 101788/19/2
RTx430_leaves_dr_r3_alignment	RNASeqAlignment	Reads 101788/22/1;101788/13/1 aligned to Genome 101788/19/2
RTx430_leaves_ww_r2_alignment	RNASeqAlignment	Reads 101788/22/1;101788/6/1 aligned to Genome 101788/19/2
RTx430_roots_dr_r3_alignment	RNASeqAlignment	Reads 101788/22/1;101788/17/1 aligned to Genome 101788/19/2
RTx430_roots_ww_r2_alignment	RNASeqAlignment	Reads 101788/22/1;101788/9/1 aligned to Genome 101788/19/2
RTx430_roots_ww_r1_alignment	RNASeqAlignment	Reads 101788/22/1;101788/7/1 aligned to Genome 101788/19/2
RTx430_roots_ww_r3_alignment	RNASeqAlignment	Reads 101788/22/1;101788/12/1 aligned to Genome 101788/19/2
RTx430_roots_dr_r2_alignment	RNASeqAlignment	Reads 101788/22/1;101788/16/1 aligned to Genome 101788/19/2
RTx430_sampleset_alignment_set	ReadsAlignmentSet	Set of all new alignments

Created Object Name	Type	Description
RTx430_sampleset_expression_set	ExpressionSet	ExpressionSet generated by StringTie
RTx430_leaves_ww_r1_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_ww_r2_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_ww_r3_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_dr_r1_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_dr_r2_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_dr_r3_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_ww_r1_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_ww_r2_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_ww_r3_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_dr_r1_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_dr_r2_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_dr_r3_expression	RNASeqExpression	Expression generated by StringTie
RTx430_sampleset_FPKM_ExpressionMatrix	ExpressionMatrix	FPKM ExpressionMatrix generated by StringTie
RTx430_sampleset_TPM_ExpressionMatrix	ExpressionMatrix	TPM ExpressionMatrix generated by StringTie

Created Object Name	Type	Description
RTx430_DESeq2_all	DifferentialExpressionMatrixSet	DifferentialExpressionMatrixSet generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_dr-VS-RTx430_leaves_ww	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_dr-VS-RTx430_leaves_dr	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_ww-VS-RTx430_leaves_dr	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_ww-VS-RTx430_leaves_ww	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_leaves_dr-VS-RTx430_leaves_ww	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_DESeq2_all-RTx430_roots_dr-VS-RTx430_roots_ww	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2

Created Object Name	Type	Description
RTx430_DESeq2_all_RTx430_leaves_dr-RTx430_leaves_ww_up_0.05q_1fc_fs	FeatureSet	Upper FeatureSet Object
RTx430_DESeq2_all_RTx430_roots_dr-RTx430_roots_ww_up_0.05q_1fc_fs	FeatureSet	Upper FeatureSet Object
RTx430_DESeq2_all_RTx430_leaves_dr-RTx430_leaves_ww_down_0.05q_1fc_fs	FeatureSet	Lower FeatureSet Object
RTx430_DESeq2_all_RTx430_roots_dr-RTx430_roots_ww_down_0.05q_1fc_fs	FeatureSet	Lower FeatureSet Object
RTx430_leaves_dr-RTx430_leaves_ww_0.05q_1fc_exp	ExpressionMatrix	Filtered ExpressionMatrix Object
RTx430_roots_dr-RTx430_roots_ww_0.05q_1fc_exp	ExpressionMatrix	Filtered ExpressionMatrix Object

A KBase Case Study on Genome-wide Transcriptomics and Plant Primary Metabolism in response to Drought Stress in Sorghum.¶

Abstract¶

Case study as an example to run the KBase workflow¶

Data Description¶

Workflow steps¶

Import of Sorghum RTx430 Genome¶

App - Import GFF3/FASTA file as Genome from Staging Area¶

Inputs:¶

Output:¶

Analysis:¶

Annotation of the Sorghum RTx430 Genome with Metabolic Enzymes¶

Input:¶

Output:¶

Analysis¶

Import of 12 SingleEndLibrary Reads from NCBI¶

Input:¶

Output:¶

Analysis¶

Group of 12 SingleEndLibrary Objects as SampleSet¶

Input:¶

Output:¶

Analysis¶

Quality assessment of SampleSet using FastQC¶

Input:¶

Output:¶

Analysis¶

Align Reads of SampleSet using HISAT2¶

Input:¶

Output:¶

Analysis¶

Assemble Transcripts using StringTie¶

Input:¶

Output:¶

Analysis¶

Create Average Expression Matrix¶

Input:¶

Output:¶

Analysis¶

Differential expression using DESeq2¶

Input:¶

Output:¶

Analysis¶

Create up/down Regulated Feature Set and Differential Expression Matrix¶

Input:¶

Output:¶

Analysis¶

Differential Expression Results and Conclusion¶

Metabolic roles of the expressed genes¶

Metabolic Reconstruction¶

Input:¶

Output:¶

Analysis¶

Integration of Abundance with Metabolism¶

Input:¶

Output:¶

Analysis¶

Visualization of Metabolic Pathway¶

Input:¶

Output:¶

Analysis¶

Metabolic Results and Conclusion¶

References¶

Released Apps

Apps in Beta