Generated July 16, 2024

Narrative for RWRtoolkit: multi-omic network analysis using random walks on multiplex networks in any species

Contributors: David Kainer [1,2,✝], Matthew Lane [3,✝], Kyle A. Sullivan [1], J. Izaak Miller [1], Mikaela Cashman [1,4], Mallory Morgan [1], Ashley Cliff [3], Jonathon Romero [3], Angelica Walker [3], D. Dakota Blair [5], Hari Chhetri [1], Yongqin Wang [5], Mirko Pavicic [1], Anna Furches [3], Jaclyn Noshay [1], Meghan Drake [1], Natalie Landry [1], AJ Ireland [4], Ali Missaoui [5], Yun Kang [7], John Sedbrook [8], Paramvir Dehal [4], Shane Canon [4], Daniel Jacobson [1,*]

  • [1] Computational and Predictive Biology Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA
  • [2] Centre of Excellence for Plant Success in Nature and Agriculture, University of Queensland, QLD, Australia
  • [3] The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee Knoxville, Knoxville, TN, USA
  • [4] Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory Berkeley, California, USA
  • [5] Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, USA
  • [6] Department of Crop and Soil Sciences, University of Georgia, Athens, GA, USA
  • [7] Noble Research Institute, Ardmore, OK, USA
  • [8] School of Biological Sciences, Illinois State University, Normal, IL, United States
  • [✝] These authors contributed equally
  • [*] Corresponding Author

Software DOI: https://doi.org/10.11578/dc.20220607.1

RWRtoolkit Apps Overview

An integral part of systems biology is to study the interactions within and across omics layers ranging from the genome to the phenome in order to study the interactions between cells and tissues that lead to compelling phenotypes. Often a Genome Wide Association Study (GWAS) looks to identify variation in genotypes across a population and associate them with a variation in a phenotype in the same population. GWAS alone is limited to what genetic variation is present within the population presenting a very specific but not comprehensive view. Often, the statistical thresholds used in GWAS tend to be very stringent in order to screen out false positives. However, interesting biology may be missed based on these limitations. Thus, we present tools for networks biology to overcome these limitations to better filter GWAS information as well as fill in gaps from other omics layers.

Random Walk with Restart (RWR)

Random walk with restart (RWR) is a type of graph traversal algorithm. In this context, the graphs being traversed are biological networks where the nodes are genes and edges between genes represent a known biological relationship between those two genes. For example, one network may be constructed from protein-protein interaction data, another from metabolic pathway data, and another from co-expression data. A multiplex network may contain one or more biological networks where the topology of each individual layer is maintained separately from other network layers, but connections are added between shared genes among each network layer. RWR explores these networks by "walking" from gene to gene along edges. In a multiplex network, the walker will utilize all network layers by transporting between the layers based on common nodes (common genes). For more information on multiplex networks or graph traversal methods please see referenced work.

RWRtools provides a set of useful methods based on the graph traversal algorithm "random walk with restart". The restart emphasizes that the walker will occasionally restart back at a starting gene (a seed gene) to balance local and global exploration.

Gene sets of Interest

The RWRtools pipeline requires a gene set of interest which designates a set of seed genes from which the walker will start at. This set is created using the Build FeatureSet from Genome application in KBase. Some RWRtools methods can also take in an additional gene set called a query gene set that will filter the results of the walker. The gene set will be chosen based on the context of the user's biolgical query. In this demo we present an example with shoot biomass and height genes in Arabidopsis thaliana.

Multiplex Networks (context data)

Another parameter of the RWRtools applications is to choose the network multiplex to set the context of the random walker. RWRtools contains a series of pre-computed Arabidopsis thaliana multiplex networks, details of each multiplex can be found here. If you are interested in using RWRtools for your own organism or networks please file a ticket for a feature request to expand RWRtools on the KBase help board.

RWRtools Methods

RWRtools currently provides two main methods:

  • RWRtools CV (Cross Validation) identifies interconnectivity of a single gene set by performing cross validation on a given gene set, finding the Random Walk with Restart (RWR) rank of the left-out genes. Cross validation methods include k-fold (default method, k = 5), leave-one-out (loo) (leave only one gene from the gene set out and use other genes to find its rank), or singletons (one gene is used to find the ranks of all other genes in the gene set). Descriptions of provided multiplexes can be found here.
  • RWRtools LOE (Lines of Evidence) finds functional context of a gene set by performing RWR starting from one gene set to rank other genes in the network using multiple lines of biological evidence. This app has two possible functions. Given one gene set of seeds, rankings for all other genes in the network will be returned. Given a second gene set of genes to be queried, rankings for just the genes in that gene set will be returned. This can be used to build multiple lines of evidence from the various input networks to relate the two gene sets. Descriptions of provided multiplexes can be found here.

References:

David Kainer, Matthew Lane, Kyle A. Sullivan, J. Izaak Miller, Mikaela Cashman, Mallory Morgan, Ashley Cliff, Jonathon Romero, Angelica Walker, D. Dakota Blair, Hari Chhetri, Yongqin Wang, Mirko Pavicic, Anna Furches, Jaclyn Noshay, Meghan Drake, Natalie Landry, AJ Ireland, Ali Missaoui, Yun Kang, John Sedbrook, Paramvir Dehal, Shane Canon, Daniel Jacobson. RWRtoolkit: multi-omic network analysis using random walks on multiplex networks in any species.

Goodstein, D. M., Shu, S., Howson, R., Neupane, R., Hayes, R. D., Fazo, J., ... & Rokhsar, D. S. (2012). Phytozome: a comparative platform for green plant genomics. Nucleic acids research, 40(D1), D1178-D1186.

Valdeolivas, A., Tichit, L., Navarro, C., Perrin, S., Odelin, G., Levy, N., ... & Baudot, A. (2019). Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics, 35(3), 497-505. doi:10.1093/bioinformatics/bty637

Demo Narrative Overview

This narrative presents a demonstration of RWRtools methods to the important bioenergy feedstock Panicum virgatum (Switchgrass) by leveraging networks from the model plant Arabidopsis thaliana.

We used a genome-wide association study (GWAS) to identify single nucleotide polymorphisms (SNPs) associated with Switchgrass shoot biomass. We then assigned these SNPs to Switchgrass genes, and using Phytozome (Goodstein et al., 2012), assigned these 38 Switchgrass GWAS genes to 32 Arabidopsis orthologs. These 32 genes are the starting set of genes (i.e. the seed gene set) for this demo narrative.

The rest of the narrative is as follows:

Data Selection

The RWRtoolkit pipeline begins by importing or adding the genome of interest. For this demo we add a genome for Arabidopsis already within KBase. We then create a feature set which is a list of genes of interest to serve as our seed geneset. We create our feature set of 32 genes of interest associated with shoot biomass according to a reduced threshold on a GWAS. This resulting FeatureSet is the data object Switchgrass_ShootBiomass_AraOrthologs.

return to overview

Create a new FeatureSet by selecting features from a Genome.
This app completed without errors in 55s.
Objects
Created Object Name Type Description
Switchgrass_ShootBiomass_AraOrthologs FeatureSet Feature Set
Summary
A new feature set containing 32 features was created.

RWRtools CV - Find Interconnectivity of Shoot Biomass FeatureSet

Now we are prepared to run RWRtools CV using the FeatureSet of Shoot Biomass created above as our Seed Gene Keys and we select the Comprehensive Network as our multiplex. Details on all available multiplexes can be found here.

Results

  • Results can be seen in the form of a Report object that can either be viewed within the narrative, or in a separate window for more space.
  • The purple nodes indicate seed genes. The teal nodes indicate highly ranked genes (within the maximum rank specified) where the darker shade indicates a higher rank. Genes can be clicked on to view their GO Terms and MAPMAN annotations.
  • By default the edges displayed are only from the networks in the multiplex specified. These can be toggled on or off on the right-hand side. Networks not used in the multiplex are indicated with an asterisk. Those can also be toggled to see what types of relationships might be useful to look at further.
  • The table at the bottom shows all displayed genes (seeds and ranked genes) with additional information. Columns in the table can be sorted.
  • There are also additional files available for download containing all rank information.

Analysis

  • Using this network output, we can interactively explore the relationships between genes. The teal genes represent genes that are not in our seed set meaning they were not identified by the GWAS study, but instead being highlighted by the RWR results as being related according to the context of the networks representing an relationship to another gene based on functional association.
  • We also see some seed genes have low connectivity between one another (see the isolated purple nodes). This may indicate that these genes might be noise and not actually related to our trait of interest. In our next section we explore what occurs if we remove such genes.

return to overview

RWRtools CV (Cross Validation) performs cross validation on a single gene set, finding the RWR rank of the left-out genes.
This app completed without errors in 2m 4s.
Summary
Report for RWR_CV with rank <= 200
Links
Output from Find Gene Set Interconnectivity using Cross Validation with RWRtools CV
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/165213

RWRtools CV - Find Interconnectivity of Filtered Shoot Biomass FeatureSet

Next we build a secondary gene set starting with our 32 shoot biomass gene set, but removing the isolated genes we suspect of being noise, resulting in a set of 29 shoot biomass genes that we suspect have a stronger association to the trait of interest. We then re-apply RWRtools CV to see what the inter-connectivity is in the filtered gene set.

In the resulting network, we now see a high degree of connectivity within this updated list of genes of interest. We can further explore these genes and their surrounding connections.

return to overview

Create a new FeatureSet by selecting features from a Genome.
This app completed without errors in 41s.
Objects
Created Object Name Type Description
Switchgrass_ShootBiomass_AraOrthologs_FilteredTop200 FeatureSet Feature Set
Summary
A new feature set containing 29 features was created.
RWRtools CV (Cross Validation) performs cross validation on a single gene set, finding the RWR rank of the left-out genes.
This app completed without errors in 4m 42s.
Summary
Report for RWR_CV with rank <= 200
Links
Output from Find Gene Set Interconnectivity using Cross Validation with RWRtools CV
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/165213

RWRtools LOE - Find Functional Context of a Single Gene Set

Now that we have demonstrated that is a high degree of connectivity among our genes using RWRtools CV, next we want to see what relationships exist between the filtered 29 shoot biomass genes and other genes in the genome. To answer this we run RWRtools LOE and look at the top 200 ranked genes.

Results

Sphingolipid/ceramide synthesis

  • In the original GWAS gene set, there was a SPHINGOID LCB DESATURASE 2 (SLD2, AT2G46210) related to our biomass data set. We can see in the results that in the protein-protein interaction network, this sphingolipid desaturase has direct connections to SPHINGOID LCB DESATURASE 1 (SLD1, AT3G61580) and DES-1-LIKE (AT4G04930).
  • In addition to this delta 8 and delta 4 desaturates, we can also see a connection between SLD2 and AGL18, a MADS-domain transcription factor, (AT3G57390) through the coexpression layer.
  • Sphingolipid and ceramide biosynthesis are important to developmental and stress responses.

Shoot apical meristem development and homeodomain transcription factor connections

  • Another gene identified by shoot biomass GWAS was KANADI1 (KAN1, AT5G16560). This gene has been previously implicated in development of the shoot apical meristem through an auxin signaling pathway (Ram et al., 2020).
  • KAN1 has also been implicated in jasmonic acid signaling in mediating growth phenotypes (Zhang et al., 2020).
  • KAN1 is also connected to multiple transcription factors: ASYMMETRIC LEAVES 2 (AS2, AT1G65620) , PHAVOLUTA (PHV, AT1G30490), PHABULOSA (PHB, AT2G34710), PIN-FORMED 1 (PIN1, AT1G73590), and WUSCHEL-RELATED HOMEOBOX 9 (WOX9, AT2G33880).
  • KAN1 was connected to PHV and PHB by protein-protein interactions and transcriptional regulation (Regulation-ATRM) networks, and was connected to AS2 and PIN1 by the Regulation-ATRM network.
  • WOX9 was connected to KAN1, PHV, and PHB by protein-protein interactions, and WOX9 was also connected to KAN1 by the exascale/petascale Predictive CG Methylation network.
  • This demonstrates how we are leveraging the networks to identify mechanistic connections between GWAS genes related to shoot biomass.

These highlight several examples of how these networks can be used to explore relationships between genes and discover interesting and meaningful biology. For the use case presented here, these insights derived from Arabidopsis thaliana networks led to the development of a conceptual model of Switchgrass genes contributing to well watered shoot biomass, a trait that is important for enhanced bioenergy feedstocks.

References:

Ram H, Sahadevan S, Gale N, Caggiano MP, Yu X, Ohno C, et al. An integrated analysis of cell-type specific gene expression reveals genes regulated by REVOLUTA and KANADI1 in the Arabidopsis shoot apical meristem. PLoS Genet. 16:e10086612020, doi: 10.1371/journal.pgen.1008661

Zhang N, Zhao B, Fan Z, Yang D, Guo X, Wu Q, et al. Systematic identification of genes associated with plant growth-defense tradeoffs under JA signaling in Arabidopsis. Planta. 251:432020, doi: 10.1007/s00425-019-03335-8

return to overview

RWRtools LOE (Lines of Evidence) uses RWR to rank genes in the network starting from a Feature Set.
This app completed without errors in 2m 3s.
Summary
Report for RWR_CV with rank <= 200
Links
Output from Find Functional Context using Lines of Evidence with RWRtools LOE
The viewer for the output created by this App is available at the original Narrative here: https://narrative.kbase.us/narrative/165213

Released Apps

  1. Build FeatureSet from Genome
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163
  2. Find Functional Context using Lines of Evidence with RWRtools LOE
    no citations
  3. Find Gene Set Interconnectivity using Cross Validation with RWRtools CV
    no citations

Apps in Beta

  1. Build FeatureSet from Genome
    • Arkin AP, Cottingham RW, Henry CS, Harris NL, Stevens RL, Maslov S, et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology. 2018;36: 566. doi: 10.1038/nbt.4163