The Reconstruct Genome-scale Metabolic Model App discussed in this tutorial has been replaced by the new and improved Build Metabolic Model App. Please see the Metabolic Modeling page for information about KBase’s modeling-related apps.
This tutorial will guide you through the steps needed to run the Reconstruct Genome-scale Metabolic Model app in the KBase Narrative Interface.
In this tutorial, we will:
The Reconstruct Genome-scale Metabolic Model app builds a metabolic model using the annotation data from an annotated Genome object to reconstruct the metabolic reactions that a cell is capable of performing. The app then performs gapfilling, which is the search for and subsequent bridging of missing metabolic reactions that were not found in the initial annotation search.
The Reconstruct Genome-scale Metabolic Model app takes a Genome object and an optional media condition as input. In KBase, a “Genome” or “Genome typed object” is a data object that contains the feature calls and annotation data for a genome. The media condition is a special object type that contains the chemical compounds found in a particular growth medium. If no defined media is selected, the app will build a model based on complete media. Complete media is a special type of media that does not include an exact list of compounds. Instead, modeling growth in complete media means that the model is allowed to consume any nutrient for which a transport reaction is available to the model. For this reason, the content of complete media can change depending on which transport reactions are present in the model. There are over 500 different media conditions to choose from in KBase.
You can load annotated genomes into KBase for analysis with this and other apps in several different ways:
This tutorial will take you through the steps for running the Reconstruct Genome-scale Metabolic Model app using example data from KBase’s reference data collection. Once you’re ready to upload your own data, see the Data Upload and Download Guide for instructions on uploading contigs from a GenBank formatted file, importing a GenBank genome from FTP, and uploading a media file in TSV (tab-separated values) format.
NOTE: The metabolic model generated by this app will be built based on the annotations of the protein-encoding genes in the input Genome. If you have imported a genome that does not contain KBase or RAST-based annotations, you will need to use the Annotate Microbial Genome app before running this app to map KBase annotations onto your genes so that the model can be built.
The outputs of this app are two FBAmodel objects, one for the initial model that was created and one for the gapfilled model. An FBAmodel object contains the reactions that were found by the metabolic model reconstruction, as well as the gapfilled reactions, if any.
Note: This tutorial assumes that you have already created a new Narrative. For instructions on how to accomplish this and other tasks such as finding or uploading data to your Narrative, please refer to the Narrative Interface User Guide.
Step 1. Add data that you want to analyze
Before we run the app, we need to copy or upload the needed input data. For the point and click instructions, we will bypass the annotation step and start by loading an annotated Escherichia coli K-12 genome into our Narrative from KBase’s reference data.
First, click the Add Data button found under the Analyze tab in the left panel of your screen. (Note that after data objects have been added to a Narrative, the Add Data button becomes a red circle with a plus sign). The Data Browser will slide out with various options for finding and loading data.
Choose the Public tab to display a list of publicly available data objects in KBase’s reference collection. Genomes are displayed by default, but the data types dropdown menu allows you to search for other types of data as well.
Use the search box to look for Genome objects that have “Escherichia coli K-12” in their names. Choose a strain with only one contig, mouse over it, and click the Add button that appears at its left to add this genome to your Narrative. (For this example, we chose substrain MG1655.)
Exit the Data Browser by clicking the Close button at the bottom right or the arrow at the top of the Data Panel. (Note that you also can close the Data Browser by clicking anywhere in the main Narrative panel in the center.)
Notice that your Data Panel now shows the genome that you added:
You can find out more about this data object by mousing over the record in the Data Panel and clicking the “…” that appears. An expanded view of the data will open:
The icons in this view let you see a data summary, download the object, see its provenance, and more. (Please see the Explore Data section of the Narrative Interface User Guide for more information.)
For now, you can examine this genome by dragging it from the Data Panel and dropping it into the main Narrative panel to open a genome viewer:
In the Contigs and Genes tabs of the genome viewer, you can sort the table entries by clicking on a column header to sort by that field (e.g., Length). Clicking the same column header again will reverse the sort order. You can even sort by more than one column at a time by clicking one column header and then Shift-clicking others.
Step 2. Add and run the app
Now that you have your input data, you can add the Reconstruct Genome-scale Metabolic Model app to your Narrative. Take a closer look at the Apps Panel directly below your data.
You can search for apps using the search box at the top of the Apps Panel or just scroll down until you find the one you want. Locate the Reconstruct Genome-scale Metabolic Model app in the list. Click its name or icon to add the app as a new cell in the main Narrative panel.
To run the app on the sample genome, you must first fill out the fields in each step in the app cell. In-depth descriptions of the parameters for each step in the app are provided in the individual app details pages. For this app, the two steps are:
For the first input field (Genome), select your E. coli genome from the dropdown menu. Next, enter a name for your initial model in the Draft Model field. For this example, we will call our model “K-12_Minimal_Initial.” Notice that the name of the draft model is automatically filled in for the first field in Step 2, the gapfilling step.
In Step 2 of the app, the gapfilling algorithm is looking for metabolic pathways that are incomplete and trying to make an informed guess as to which protein-encoding genes may be responsible for the missing step in the pathway. This process works best when informed by the a priori knowledge that a particular organism can grow on a defined media type. Thus, specifying a growth media for this step is beneficial. If you do not specify a media type, the algorithm will assume that you wish to build a model on complete media. (As noted in the “Description of Input” section above, Complete media is a special type of media that does not include an exact list of compounds. Instead, modeling growth in complete media means that the model is allowed to consume any nutrient for which a transport reaction is available to the model.)
As with contigs and genomes, the growth media is an object type in KBase. To copy a media object into your Narrative, access the Data Browser again by clicking the red “+” button in your Data Panel. Under the Public tab, select Media from the data type dropdown menu. This will display over 500 different media types that have been defined. (Note: If you don’t see any media objects, make sure there is no text in your search field.)
Now search for “Carbon-D-Glucose” media and add it to your Narrative.
Exit the Data Browser and return to Step 2 in the app cell. We will now be able to select Carbon-D-Glucose media from the dropdown menu in the Media field.
Finally, provide a name for your output gapfilled model. Here, we will use “K-12_Minimal_Gapfilled.”
Notice that as you fill in the required parameter fields, the red arrows next to those fields change to green checkmarks. Once all required fields have a green checkmark, the app is ready to run.
Click the Run button at the bottom of the cell to launch the analysis job. A blue box will appear around Step 1 (Build Metabolic Model) indicating that the step is running, and a message at the bottom of the app cell will say that the job was submitted.
Depending on the queue size (how many other calculations have been requested by users recently), this job should take approximately 3 minutes. You can check your job status by selecting the Jobs tab near the top left of the Narrative Interface.
The Analyze Data Using Apps section of the Narrative Interface User Guide provides more details about running apps.
Step 3. Look at the output
Once complete, the app generates two FBAmodel objects in your Narrative. Note that although the objects are called “FBA models,” we have not actually performed a flux balance analysis. The file type reflects the type of metabolic modeling that we are doing, which is called FBA modeling. Output cells for the initial and gapfilled models also appear in the main Narrative panel. There are eight tabs for browsing the data: Overview, Reactions, Compounds, Genes, Compartments, Biomass, Gapfilling, and Pathways. The Gapfilling field will be empty for the initial model and populated for the gapfilled model.
In general, a metabolic model is the list of metabolic reactions that the cell is predicted to perform. In the Gapfill Metabolic Model output cell, a user is able to browse the linkage between reactions and protein-encoding genes, search for compounds used in the model, and identify gaps in the model that were not found by the annotation tools.
Step 4. Download the results
KBase provides several options for downloading your metabolic model in various formats. First, locate the annotated FBAModel object in your Data Panel. (If you don’t see this panel, make sure you have the Analyze tab selected near the top left of the Narrative Interface.)
Mouse over the object and click the “. . .” to open an expanded view of the data. Next, click the Export/Download data icon to reveal options for downloading the data in SBML (systems biology markup language), TSV (tab-separated values), Excel, or JSON formats.
Note to PC users: If downloading to Excel, the data will be placed into a zipped folder whose name (or path) can be long, depending on the data object’s name and type. If the folder path becomes too long, Windows may not be able to open it. Try copying or moving the file to a folder or directory that has a shorter path if you encounter problems.
The selection of media type can significantly affect the decisions made when the gapfilling step is being performed. For instance, the constraints that exist on an organism growing on minimal media differ from those facing an organism growing on complete media. For growth on minimal media, the gapfilling algorithm may seek a solution that requires the synthesis of compounds. However, if complete media is used, a more simple solution may be sought that enables the transport or breakdown of certain compounds. We will present a use case in which we observe the differences between a model with two gapfilled solutions: one generated on complete media, and one generated on minimal media.
First, add the Streptococcus pneumoniae OXC141 genome to your Narrative by searching for it under the Public tab of the Data Browser. Still using the Public tab, select Media from the data types dropdown menu and search for “Carbon-D-Glucose” media and “Complete” media. (Note: You will not need to add the Carbon-D-Glucose media unless you have started a new Narrative for this use case. If you are using the same Narrative, this media will still be in your Data Panel from the previous analysis.) Once you have located both media objects, add them to your Narrative.
Next, add the Reconstruct Genome-scale Metabolic Model app to your Narrative. First, we will launch a metabolic model for the Strep genome on complete media. We will call initial and gapfilled models the same name, “Strep_complete,” so that only one gapfilled model object will be added in our Narrative.
Once you’ve filled in all parameters, click the Run button. This job should take only a few minutes to run. When complete, the Gapfill Metabolic Model output cell will look like this:
After our “Strep_complete” model has finished, choose the Gapfill Metabolic Model app from the list. This will add the app cell below our “Strep_complete” model output in the main Narrative panel. (You might notice that this app is identical to Step 2 of the Reconstruct Genome-scale Metabolic Model app). We will now attempt to gapfill the model that we built on complete media to see what additional reactions are needed to support growth on minimal media. Fill out the Gapfill Metabolic Model app by selecting “Strep_complete” for the Draft Model field and Carbon-D-Glucose as the media type. We will call the gapfilled model “Strep_minimal.”
This app should finish in less than 3 minutes, and the output will look like this:
After gapfilling on complete media, 827 reactions are needed to support growth; but after gapfilling the same model on Carbon-D-Glucose minimal media, 871 reactions are required.
A metabolic model is just a list of reactions found (or gapfilled and inserted into the model) based on data from the annotated genome. As such, it is fairly straightforward to identify the differences between these two models by comparing their lists of reactions and thus understand the underlying assumptions that have been made by the gapfilling algorithm.
To compare the two models, we will use the Compare Two Metabolic Models app. Add it to your Narrative by finding and clicking on it in the list of apps.
We will choose the “Strep_minimal” model for the FBA Model 1 field and the “Strep_complete” model for FBA Model 2. No Proteome Comparison object is required for the third field because there was only one genome, so we will leave this field blank. Click Run to launch the app, which should compute instantaneously.
We can see from the output, that there are 827 shared reactions, and 44 additional reactions necessary for growth on minimal media.
If you browse through the set of reactions found under the Strep_minimal only tab, you will see each of the 44 reactions that were found when using minimal media, which are likely to be involved in some anabolic process. You can search among these results and note, for instance, that there are 3 synthases, 4 hydrolases, and 8 lyases.
Now that you understand the basics of creating a model and gapfilling it, the next step in metabolic modeling would to proceed to flux balance analysis (FBA), which attempts to predict growth and the synthesis of compounds. If you are interested in FBA, we recommend reading the tutorial for the Run Flux Balance Analysis.