This tutorial will guide you through the steps needed to run the Propagate Genome-scale Model to Close Genome app in the KBase Narrative Interface.
In this tutorial, we will:
The Propagate Genome-scale Model to Close Genome app builds a new metabolic model by translating the gene associations in an existing model to a new genome. This translation is conducted based on the gene correspondence tables contained in the input proteome comparison object. A gapfilling is then performed to permit the translated model to grow on a specific media condition.
It is important to note that imported models (or translations of imported models) can contain custom reactions and compounds that are not found in the KBase biochemistry database. These models may also contain biomass components that lack complete biosynthesis pathways in the KBase biochemistry database. When this occurs, gapfilling will fail to identify a feasible solution using only the reactions contained in the KBase biochemistry. For this reason, the gapfilling performed in the Propagate Genome-scale Model to Close Genome app uses the original source model as an additional source of potential gapfill reactions. This ensures that the gapfilling will be able to restore growth on any condition that the original source model could grow on.
The final step of this app conducts a detailed comparison of the original model with the model produced by the translation. This exposes all the functions and genes lost during the translation process, highlighting key metabolic differences between the genomes used.
It is extremely important to recognize that the model produced by this app will always be a subset of the source model used in the translation. This is true even if the new genome contains metabolic functions that are not in the genomes associated with the source model. In future versions of this tool, we will provide the option to combine this app with the model reconstruction pipeline built into KBase, enabling the addition of new reactions to a translated model to capture new metabolic features that do not exist in the genome associated with the source. For now, you could apply both apps to your genome, then use the model comparison app to identify the differences and produce a consolidated reconciled model.
For more on the model translation process specifically, please see the app details page. Also check out the Metabolic Modeling FAQ for answers to common questions about KBase’s metabolic modeling tools and datasets.
Step 1. Compare Two Proteomes
Construct a table of corresponding genes between the original source model genome and the genome to which we are translating our source model
Step 2. Propagate Model to New Genome
Translate the original source model to a new genome based on the input set of gene correspondence tables
Step 3. Gapfill Metabolic Model
Identify the minimal set of biochemical reactions to add to a draft metabolic model to enable it to produce biomass in a specified media
Step 4. Compare Two Metabolic Models
Compare the original source model to the model produced by the translation, permitting the rapid identification of reactions lost during the translation
The Propagate Genome-scale Model to Close Genome app takes a source metabolic model, the genome associated with the source model, a new genome to target for the translation, and an optional media formulation as input. In KBase, an “FBAModel” or “Metabolic Model typed object” contains the reactions, compounds, compartments, biomass reactions, and gene associations that comprise a metabolic model. A “Genome” or “Genome typed object” is an object type that contains the feature calls and annotation data for a genome. A media formulation is a special object type that contains the chemical compounds found in a particular growth environment. If you do not select a defined media type, the app will build a model based on complete media. Complete media is a special type of media that does not include an exact list of compounds. Instead, modeling growth in Complete media means that the model is allowed to consume any nutrient for which a transport reaction is available to the model. For this reason, the content of complete media can change depending on what transport reactions are present in the model. KBase includes over 500 media formulations that may be used as potential growth conditions.
There are several ways to load a source metabolic model and genomes into your Narrative so that you can use them as input to this and other apps.
In this tutorial, we will use example data available in KBase. Once you are ready to analyze your own data, see the Data Upload and Download Guide for instructions on importing metabolic models and genomes into KBase.
Note that model translation constructs a new metabolic model based on gene correspondence tables computed by the Compare Two Proteomes app, which determines gene correspondence based on bidirectional best hits. For this reason, the annotations in either the source genome or the target genome are not utilized by the model translation process. Thus, it is very possible that a gene in the target genome may be assigned to a reaction in the translated model that conflicts with its annotated function. This is especially true if the source model for the translation was not originally constructed in KBase or ModelSEED. Such inconsistencies represent a conflict between the annotation predicted for the input genome and the annotation used to construct the source model. These inconsistencies must be manually identified (for now) and reconciled. One easy way to identify these inconsistencies in KBase is to run the Reconstruct Genome-scale Metabolic Model app on the same genome, and compare the translated model to the freshly constructed model using the Compare Two Metabolic Models app.
The output of this app is two FBAModel objects, one for the initial model that was created by the translation and one for the gapfilled version of the translated model. An FBAModel object contains the reactions, compounds, compartments, biomass reactions, and gene associations that comprise a metabolic model.
Note: This tutorial assumes that you have already created a new Narrative. For instructions on how to accomplish this and other tasks such as finding or uploading data to your Narrative, please refer to the Narrative Interface User Guide.
Step 1. Add data that you want to analyze
Before we run the app, we need to copy or upload the needed input data. For this tutorial, our source model will be the iRsp1140 model of Rhodobacter Sphaeroides 2.4.1 published by Tim Donahue’s lab at the University of Wisconsin–Madison as part of research within the DOE Great Lakes Bioenergy Research Center . This model and its corresponding genome are listed among the example datasets available in KBase.
To retrieve the model and genome, access the Data Browser by clicking on the Add Data button, which can be found under the Analyze tab in the left panel of your screen (note that after files have been added to a Narrative, the Add Data button becomes a red circular button with a plus sign).
The Data Browser has various tabs for importing options. Choose the Example tab to see a list of example data objects listed by type. Scroll down to the section labeled Example Genomes.
Add two genomes to your Narrative, Rhodobater_sphaeroides_2.4.1 and Rhodobacter_CACIA_14H1, by clicking on the Add buttons located to the left of these objects.
Next, select your source model by scrolling down to the section labeled Example FBAModels. Select the iRsp1140 model from this section by clicking on the Add button to the left of this model.
Finally, scroll down to the Example Media section and add the media labeled Rsp-minimal media. This is the media specified by the Donahue group as a confirmed minimal media formulation for Rhodobacter Sphaeroides 2.4.1.
Exit the Data Browser by clicking the Close button at the bottom right of the browser window or the arrow at the top of the Data Panel. (Note that you also can close the Data Browser by clicking anywhere in the main Narrative panel in the center.)
Notice that your Data Panel now shows the two genomes, one model, and one media you added to your Narrative:
You can find out more about a data object by mousing over its record in the Data Panel and clicking on the “…” that appears. An expanded view of the data object will open:
The icons in this view let you see a data summary, download the object, see its provenance and more. (Please see the Explore Data section of the Narrative Interface User Guide for more information.)
For now, you can examine these data objects by dragging them from the Data Panel and dropping them onto the main Narrative panel, which creates a viewer like this:
Step 2. Add and run the app
Now that you have the needed input data, you can add the Propagate Genome-scale Model to Close Genome app to your Narrative. Look closer at the Apps Panel directly below your data.
You can search for apps using the search box at the top of the Apps Panel, or just scroll down until you find the one you want. Locate the Propagate Genome-scale Model to Close Genome app in the list and click on its name or icon to add it as a new cell in your main Narrative panel.
To run the app on the sample data, you must first fill out the fields in each step in the app cell. The parameters for each step in the app are described in-depth in the individual app detail pages. For this app, the four steps are from these apps:
For the Genome1 ID and Genome2 ID fields in Step 1, select the Rhodobacter_sphaeroides_2.4.1 and Rhodobacter_CACIA_14H1 genomes from the respective dropdown menus. Next, enter a name for the proteome comparison object that will be generated by this step of the app. Here, we will use RhodobacterComparison. The proteome comparison object will automatically fill in the second field for Step 2, which is the model propagation step.
In Step 2 of the app, the model propagation will use the gene correspondence tables generated by the proteome comparison in Step 1 to translate a model of Rhodobacter_sphaeroides_2.4.1 into a new model of the Rhodobacter_CACIA_14H1. To fill in this app, you must select the model you wish to translate for the FBA Model field. For this tutorial, select the iRsp1140 model from the dropdown menu. You must also supply a name for the model that will be generated by this app. Here, we will use “Translated_CACIA_14H1.”
In Step 3 of the app, the model generated in Step 2 is gapfilled to permit growth in a specified media condition. The gapfill algorithm is looking for metabolic pathways that are incomplete and trying to make an informed guess as to which protein-encoding genes may be responsible for the missing step in the pathway. This process works best when informed by the a priori knowledge that a particular organism can grow on a defined media type. Thus, it is beneficial if you declare a growth media for this step. If you do not declare a media type, the algorithm will assume that you wish to build a model on complete media. For this tutorial, we will select the Rsp-minimal media, which is a known minimal media for Rhodobacter_sphaeroides_2.4.1. You also need to name the gapfilled model that will be produced by this app. Here we will use “Translated_CACIA_14H1.gf.”
In Step 4 of the app, we will compare our source model, iRsp1140, with our newly created and gapfilled model of Rhodobacter_CACIA_14H1, Translated_CACIA_14H1.gf. You do not need to fill in any input fields for this app at this stage because previous inputs from earlier steps were already carried forward. (This is a key advantage of apps, which provide users with pre-set analyses consisting of a series of apps whose outputs are automatically used in logical follow-on steps.)
The screenshot below displays what the app should look like once completely filled in.
Click the Run button at the bottom of the cell to launch the analysis job. A blue box will appear around Step 1 (Compare Two Proteomes), and a message at the bottom of the app cell will indicate that the job was submitted.
Depending on the queue size (how many other calculations have been requested by users recently), this job should take approximately 10 minutes. You can check your job status by selecting the Jobs tab near the top left of the Narrative Interface.
The Analyze Data Using Apps section of the Narrative Interface User Guide provides more details about running apps.
Step 3. Look at the output
Once complete, the app generates one proteome comparison and two FBAmodel objects that should appear in your Data Panel. Note that although the objects are called “FBA models,” we have not actually performed a flux balance analysis. The object type reflects the type of metabolic modeling that we are doing, which is called FBA modeling.
Output cells for all steps appear in the main Narrative panel. (For detailed instructions on how to navigate the synteny plot produced in Step 1, see the tutorial for the Compare Two Proteomes app.)
Here, we will discuss in particular the output of the initial and gapfilled models and the model comparison. There are seven tabs for browsing the data: Overview, Reactions, Compounds, Genes, Compartments, Biomass, and Gapfilling. The Gapfilling field will be empty for the initial model and populated for the gapfilled model.
In general, a metabolic model is the list of metabolic reactions that the cell is predicted to perform. In the output cell, you can browse the linkage between reactions and protein-encoding genes, search for compounds used in the model, and identify gaps in the model that were not found by the annotation tools.
Finally, the model comparison shows how much of the content from the original source model failed to translate to the new genome due to a lack of corresponding genes for certain metabolic functions. This highlights metabolic functions that the new genome appears to have lost.
If you need additional help, please contact us.
Step 4. Download the results
You can download an FBA model in several formats: SBML (systems biology markup language), TSV (tab-separated values), Excel, or JSON. Open the expanded view of the object in the Data Panel, then click on the Export/Download data icon to see the available download options.
Note to PC users: If downloading to Excel, the data will be placed into a zipped folder whose name (or path) can be long, depending on the data object’s name and type. If the folder path becomes too long, Windows may not be able to open it. Try copying or moving the file to a folder or directory that has a shorter path if you encounter problems.
The proteome comparison object currently can be downloaded in JSON format.
The workflow above actually highlights a relevant biological use case. We translated a heavily curated model of Rhodobacter Sphaeroides 2.4.1 to a new Rhodobacter genome. This allows us to capture some of the curation and other details from the iRsp1140 model (like the biomass composition reaction) without having to repeat the curation with our new genome. This substantially jumpstarts the model reconstruction process for our new genome. However, the model comparison from this app also highlights differences that appear to exist in the metabolism of Rhodobacter CACIA 14H1 compared with Rhodobacter Sphaeroides 2.4.1.
Further analysis / next steps
Now that you understand the basics of creating a model and gapfilling it, the next logical step in metabolic modeling would be to proceed to a Flux Balance Analysis, which attempts to predict growth and the synthesis of compounds. If you are interested in FBA, we recommend reading the tutorial on FBA next.
Find more app tutorials here!
If you have questions, you can contact us.