Intro and Resources¶

Intro¶

This Narrative is created for the intro to KBase workshop for PAG to be presented January 10, 2025. It very closely follows the KBase Case Study Narrative that was used in the Current Plant Biology publication A KBase case study on genome-wide transcriptomics and plant primary metabolism in response to drought stress in Sorghum..

Case Study Backgound¶

This study takes transcription data from a Sorghum bicolor drought study. The original study from Varoquaux et al subjected the Sorghum RTx430 genotype to 8 weeks of water deprivation to simulate drought conditions.

There are 12 sets of RNA-seq reads used in this analysis. The conditions are well-watered controls (ww) and drought-stressed (dr) from each of the leaves and roots, with 3 replicates for each.

For full details, please see the paper linked above.

Quick References¶

Some quick links to references are attached here.

Presentation slides

KBase Case Study Narrative

KBase Case Study Publication

Import Files¶

Upload Files¶

First upload files to the "staging area." This is a temporary storage where raw files are held until processed. These files aren't usable in KBase per se and have to be read. The staging area is regularly purged to remove files older than 90 days.

Import Files¶

To get the files read into a usable format, they must be imported. You can import either by selecting the appropriate import type from the dropdown or by using the spreadsheet import specification option.

Either option results in the same import cell. In my experience, CSV upload is faster once you get above 15-20 files.

Once the files are verified to be correct and the correct desired output object name is selected, click "Run" to start the import.

This will read the uploaded files as save the contents behind the scenes in the KBase internal representation.

Import Genome and Reads¶

In this case, we will import the Genome using the GFF+FASTA files by uploading them to staging, then importing together.

In the case of reads, the use case this demo is based on used the app Import SRA File as Reads From Web. If you have publicly available reads such as on NCBI's Sequence Read Archive, this app will combine the upload/import steps "under the hood" and upload them then import them all in one app.

In this demo, the reads are copied from the original Narrative.

Annotate Plant Enzymes with OrthoFinder¶

The GFF provided from JGI does not have functional annotation, only structural. In the app below, we use OrthoFinder to provide the functional annotation.

At the moment, OrthoFinder in KBase is the only app for plant annotation and it does require structural annotations to be provided. We do not currently have an app to perform structural annotation.

Create RNA-Seq SampleSet¶

The RNA-seq apps in KBase operate starting with an RNA-seq SampleSet. This object links the 12 reads libraries into groups that are operated on together. We'll have 4 condition labels (roots and leaves, well-watered and drought-stressed) each with 3 replicates.

Take care when creating SampleSets. Some apps rely on outputs of other apps which means if you make an error at this stage you may need to redo the entire chain.

Assess read quality with FastQC¶

We can run FastQC to assess the quality of our reads. Since this is a demo dataset with pre-cleaned reads, we already know the quality is good.

If these reads needed to be cleaned, we could use apps like Trimmomatic to process the reads. Use the "apps using this type as input" filter to quickly filter apps to those that take reads as input to find all apps that can operate on this type.

Align Reads with HISAT2¶

The first step of the RNA-seq workflow is to align with HISAT.

HISAT2 is normally the longest-running app in this pipeline.

HISAT2 produces one RNASeqAlignment per reads library (12 in this case) as well as 1 ReadsAlignmentSet. If you want to use these outputs in external apps or custom code, you can download the alignments as BAM or SAM files.

The app also produces a QualiMap report which can be viewed in a separate tab or window.

Assemble Transcipts with StringTie¶

The ReadsAlignmentSet links all the RNASeqAlignments together for the next step.

In this step, we assemble the reads using StringTie.

StringTie produces expression objects for each alignment, which can be downloaded as a zip file containing several files. This is documented in by a dependency of StringTie, Ballgown, in their documentation.

The app also produces two ExpressionMatrices, TPM (transcripts per million) and FPKM (fragments per kilobase of transcript, per million fragments mapped). Both of these are downloadable as Excel/CSV.

As with alignment, it will also produce an ExpressionSet which is our input for the next step.

Create Average Expression Matrix¶

To find the average abundances for each gene in each condition, the normalized expression matrix is averaged across the biological replicates for each condition.

This average expression matrix “RTx430_sampleset_TPM_ExpressionMatrix_average” is used in a later step to assign reaction level expression scores to study plant primary metabolism.

Differential Expression with DESeq2¶

The output ExpressionSet from StringTie feeds directly into DESeq2 for differential expression.

This app produces a DifferentialExpressionMatrix for each comparison which can be downloaded for further analysis.

Create Up/Down Regulated FeatureSet and ExpressionMatrix¶

This app allows us to subset the expression matrix to only consider features that are up- or down-regulated by a certain amount.

This doesn't do any new analysis but rather filters the existing matrix to a smaller set. It also creates FeatureSets. FeatureSets are groups of genes or other features inside the Genome object. A FeatureSet is used in some apps like BLAST to examine a smaller set of genes more closely.

Metabolic Reconstruction¶

The app Reconstruct Plant Metabolism app allows us to create a plant metabolic model based on the genome annotations performed earlier.

This type of model behaves similarly to bacterial/fungal metabolic models, for which we have detailed tutorials and documentation on our YouTube channel and docs.kbase.us.

Integration of Abundance with Metabolism¶

This app allows us to map our previously calculated expression abundances in the expression matrix with the model constructed above.

Below I've run the app with all 3 drought leave expression matrices.

Visualization of Metabolic Pathway¶

This lets us map the metabolic model visually using the Escher Pathway Viewer.

We can combine the model that we produced above with the expression data to show the difference in expression on the map.

References¶

Kumari, S., Kumar, V., Beilsmith, K., Seaver, S. M. D., Canon, S., Dehal, P., Gu, T., Joachimiak, M., Lerma-Ortiz, C., Liu, F., Lu, Z., Pearson, E., Ranjan, P., Riel, W., Henry, C. S., Arkin, A. P., & Ware, D. (2021). A KBase case study on genome-wide transcriptomics and plant primary metabolism in response to drought stress in Sorghum. In Current Plant Biology (Vol. 28, p. 100229). Elsevier BV. https://doi.org/10.1016/j.cpb.2021.100229
Kumari S, Kumar V, Beilsmith K, Seaver SMD, Canon S, Dehal P, et al. A KBase case study on genome-wide transcriptomics and plant primary metabolism in response to drought stress in Sorghum. Current Plant Biology. Elsevier BV; 2021. p. 100229. doi:10.1016/j.cpb.2021.100229
Varoquaux N, Cole B, Gao C, Pierroz G, Baker CR, Patel D, et al. Transcriptomic analysis of field-droughted sorghum from seedling to maturity reveals biotic and metabolic responses. Proceedings of the National Academy of Sciences. Proceedings of the National Academy of Sciences; 2019. pp. 27124–27132. doi:10.1073/pnas.1907500116
Ballgown documentation: https://github.com/alyssafrazee/ballgown

Created Object Name	Type	Description
RTx430_leaves_ww_r2_alignment	RNASeqAlignment	Reads 203090/18/1;203090/13/1 aligned to Genome 203090/25/1
RTx430_roots_ww_r2_alignment	RNASeqAlignment	Reads 203090/18/1;203090/7/1 aligned to Genome 203090/25/1
RTx430_roots_dr_r2_alignment	RNASeqAlignment	Reads 203090/18/1;203090/10/1 aligned to Genome 203090/25/1
RTx430_roots_dr_r1_alignment	RNASeqAlignment	Reads 203090/18/1;203090/9/1 aligned to Genome 203090/25/1
RTx430_leaves_ww_r3_alignment	RNASeqAlignment	Reads 203090/18/1;203090/14/1 aligned to Genome 203090/25/1
RTx430_leaves_dr_r2_alignment	RNASeqAlignment	Reads 203090/18/1;203090/16/1 aligned to Genome 203090/25/1
RTx430_roots_dr_r3_alignment	RNASeqAlignment	Reads 203090/18/1;203090/11/1 aligned to Genome 203090/25/1
RTx430_roots_ww_r3_alignment	RNASeqAlignment	Reads 203090/18/1;203090/8/1 aligned to Genome 203090/25/1
RTx430_roots_ww_r1_alignment	RNASeqAlignment	Reads 203090/18/1;203090/6/1 aligned to Genome 203090/25/1
RTx430_leaves_dr_r1_alignment	RNASeqAlignment	Reads 203090/18/1;203090/15/1 aligned to Genome 203090/25/1
RTx430_leaves_dr_r3_alignment	RNASeqAlignment	Reads 203090/18/1;203090/17/1 aligned to Genome 203090/25/1
RTx430_leaves_ww_r1_alignment	RNASeqAlignment	Reads 203090/18/1;203090/12/1 aligned to Genome 203090/25/1
RTx430_sampleset_alignment_set	ReadsAlignmentSet	Set of all new alignments

Created Object Name	Type	Description
RTx430_sampleset_expression_set	ExpressionSet	ExpressionSet generated by StringTie
RTx430_leaves_ww_r1_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_ww_r2_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_ww_r3_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_dr_r1_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_dr_r2_expression	RNASeqExpression	Expression generated by StringTie
RTx430_leaves_dr_r3_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_ww_r1_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_ww_r2_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_ww_r3_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_dr_r1_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_dr_r2_expression	RNASeqExpression	Expression generated by StringTie
RTx430_roots_dr_r3_expression	RNASeqExpression	Expression generated by StringTie
RTx430_sampleset_FPKM_ExpressionMatrix	ExpressionMatrix	FPKM ExpressionMatrix generated by StringTie
RTx430_sampleset_TPM_ExpressionMatrix	ExpressionMatrix	TPM ExpressionMatrix generated by StringTie

Created Object Name	Type	Description
RTx430_sampleset_expression_set_deseq	DifferentialExpressionMatrixSet	DifferentialExpressionMatrixSet generated by DESeq2
RTx430_sampleset_expression_set_deseq-roots_ww-VS-leaves_dr	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_sampleset_expression_set_deseq-roots_dr-VS-roots_ww	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_sampleset_expression_set_deseq-roots_dr-VS-leaves_dr	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_sampleset_expression_set_deseq-roots_ww-VS-leaves_ww	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_sampleset_expression_set_deseq-leaves_dr-VS-leaves_ww	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2
RTx430_sampleset_expression_set_deseq-roots_dr-VS-leaves_ww	DifferentialExpressionMatrix	DifferentialExpressionMatrix generated by DESeq2

Created Object Name	Type	Description
RTx430_sampleset_expression_set_deseq_roots_ww-leaves_dr_up_feature_set	FeatureSet	Upper FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_dr-roots_ww_up_feature_set	FeatureSet	Upper FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_dr-leaves_dr_up_feature_set	FeatureSet	Upper FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_ww-leaves_ww_up_feature_set	FeatureSet	Upper FeatureSet Object
RTx430_sampleset_expression_set_deseq_leaves_dr-leaves_ww_up_feature_set	FeatureSet	Upper FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_dr-leaves_ww_up_feature_set	FeatureSet	Upper FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_ww-leaves_dr_down_feature_set	FeatureSet	Lower FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_dr-roots_ww_down_feature_set	FeatureSet	Lower FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_dr-leaves_dr_down_feature_set	FeatureSet	Lower FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_ww-leaves_ww_down_feature_set	FeatureSet	Lower FeatureSet Object
RTx430_sampleset_expression_set_deseq_leaves_dr-leaves_ww_down_feature_set	FeatureSet	Lower FeatureSet Object
RTx430_sampleset_expression_set_deseq_roots_dr-leaves_ww_down_feature_set	FeatureSet	Lower FeatureSet Object
roots_ww-leaves_dr_filtered_expression_matrix	ExpressionMatrix	Filtered ExpressionMatrix Object
roots_dr-roots_ww_filtered_expression_matrix	ExpressionMatrix	Filtered ExpressionMatrix Object
roots_dr-leaves_dr_filtered_expression_matrix	ExpressionMatrix	Filtered ExpressionMatrix Object
roots_ww-leaves_ww_filtered_expression_matrix	ExpressionMatrix	Filtered ExpressionMatrix Object
leaves_dr-leaves_ww_filtered_expression_matrix	ExpressionMatrix	Filtered ExpressionMatrix Object
roots_dr-leaves_ww_filtered_expression_matrix	ExpressionMatrix	Filtered ExpressionMatrix Object

Intro and Resources¶

Intro¶

Case Study Backgound¶

Quick References¶

Import Files¶

Upload Files¶

Import Files¶

Import Genome and Reads¶

Annotate Plant Enzymes with OrthoFinder¶

Create RNA-Seq SampleSet¶

Assess read quality with FastQC¶

Align Reads with HISAT2¶

Assemble Transcipts with StringTie¶

Create Average Expression Matrix¶

Differential Expression with DESeq2¶

Create Up/Down Regulated FeatureSet and ExpressionMatrix¶

Metabolic Reconstruction¶

Integration of Abundance with Metabolism¶

Visualization of Metabolic Pathway¶

References¶

Released Apps

Apps in Beta