The new version of the Narrative Interface also has search capability. Please see the “Explore Data” section of the Narrative Interface User Guide” for more information.
Currently searchable data includes metagenomes, genome features, genomes, and metabolic models. This list will be expanding rapidly as we add more data types to our search infrastructure. Please visit the Data Summary page for a complete listing of data types incorporated into KBase.
This guide will show you how to search, sort, filter, and transfer data objects to a Narrative for subsequent analysis with KBase apps and methods.
Note that Data Search does not yet work on user-uploaded data. Please check back frequently for updates.
There are two ways to access the reference data contained in KBase. You can search KBase data even if you are not logged in through a KBase account. However, unless you are logged in you will not be able to save the results of your searches and transfer them to the Narrative Interface for analysis. From the kbase.us home page, locate the “Data & Tools” dropdown menu and select “Search Reference Data.” This option works even if you do not have a KBase account.
Registered KBase users can also access the Data Search interface from the Narrative Interface via the “Search Data” menu option in the dropdown menu located in the top left corner.
In this guide, we will assume that you have signed in to the Narrative Interface (narrative.kbase.us) with your KBase username and password. If you are not familiar with the Narrative Interface, you may wish to consult the Narrative Guide for an explanation of the major components, or the Narrative Quick Start for a quick overview.
When you access Data Search, it will conduct an initial query of all of the public data in KBase. After a few seconds you should see a short list of major data categories, each with a count showing how many data objects were found in that category. This represents a real-time query of all available reference data in KBase.
The Data Search tool will appear and conduct an initial query of the entire public data store. After a few seconds you should see a short list of major data categories. This represents a real-time query of all available data.
Please note: This tool accesses public reference data that has been loaded into KBase. It does not yet include data uploaded by KBase users for use in Narratives, even if that data has been made publicly available.
Key features of this page include:
To execute a keyword search, enter some text related to the data you are looking for. For example, try entering the keyword “arabidopsis” in the search box and click the search icon (the magnifying glass) or press Enter. This searches for your keyword across all categories of searchable data and returns the number of results found in each category (if any).
In this example, the search string “arabidopsis” matches some objects in three data categories, which are listed with a count of objects found.
Search results can be refined by the category of data and filtered according to taxonomic information, genome features, publication information, or type of biological model, depending on the data object type. Clicking on a category name will display a list of search results for that category. For example, notice the Genomes category contains four data objects for the “arabidopsis” query. Click on this category to view these genomes on the results page (see image below).
Key components of the search results page include:
The Views buttons display results in either a compact view (default) or an expanded view with additional information in light blue boxes below each result (see image). To see the expanded view of the Arabidopsis genome results, click the second “Views” button. For the Genomes category, the additional information is the Taxonomy lineage for each organism.
You can change your search query by replacing or adding to the text in the search box. Keep in mind that if you are looking at search results within one data category, any new search that you perform from this view will search only the category you are currently in. To start a new search across all categories, click the “Return to All Categories” link to access the main search page.
Let’s say we decide to search for all the plant genomes available in KBase. We click “Return to All Categories”, select the Genomes category, and then replace the “arabidopsis” search string with “Viridiplantae” (which will look for all plants plus green algae). Press Enter or the search icon to see the results (see image). Because this search was done from the Genomes category view, only genomes (and no other categories) are searched.
When you have many results, sorting them can help you find what you are looking for more efficiently. Depending on the type of information in a column of results, columns can be sorted by ascending or descending order or by alphabetical or reverse alphabetical order.
Suppose you want to sort the list of plant genomes by number of contigs in order to locate those genomes with the smallest number of contigs, which tend to be more completely sequenced. Click the “Contigs” column header and select “Sort ascending” from the drop-down options. Notice that when the Contigs header was selected, the color of the column header text turned purple, and the icon below it changed to indicate the type and direction of the sort.
You can sort results by more than one column at a time. For example, in addition to sorting your list of plant genomes by the number of contigs, you can do a secondary sort by the DNA size of each organism. Click the “DNA Size bp” column and select ascending order. Now both column headers have purple titles and icons, and they are numbered to indicate the order in which the sorts were applied to your search results. This allows you to keep track of complicated sorts.
You can remove any sort by clicking on the column header and selecting the “Clear this sort” option from the dropdown menu.
Another way to locate the search results that you’re interested in is by using filters. The Filters section at the bottom left of the search page lists one or more types of filters that can be expanded to show fields that you can select to restrict your results. Each filter has a number to the right which indicates how many results match that filter. Clicking the filter checkbox will apply the filter to your results.
Note: Eukaryotes and plants are currently absent from the taxonomy filter on the initial data search page. To activate these filtering options, do one of the following:
The Sequence Homology Search allows you to search for KBase reference genomes and genome features using a DNA or protein sequence, find matching genomes, genes or proteins, select them, and copy them to a Narrative.
The key components of the homology search page include:
1. Sequence box – You can enter a nucleotide or protein sequence, either as a plain sequence or in FASTA format. Multiple query sequences are currently not supported.
2. Database selection – You can search your sequence against one of the following databases build from all KBase reference genomes:
The non-redundant gene and protein sequence databases are constructed by matching all identical gene or protein sequences using MD5 checksums. Only one representative sequence is included in the BLAST database. The FASTA definition line for the representative sequence summarizes the total number of identical sequences present in the database. As more and more closely related genomes are sequenced and added to the system, using non-redundant sequences makes the searches more scalable and efficient. Without non-redundant sequences, the top results to a search might all be to the identical genes/proteins from closely related genomes, preventing users from seeing any sequence variations or getting distant hits.
Based on the input nucleotide or protein query sequence entered in the box, the non-redundant gene or protein sequence database is selected automatically. You can also select a different database using the drop-down menu to enhance your search.
3. Advanced options – Allows you to select one or more reference genomes and search only against those genomes using the specified program.
The advanced options include:
The results from the sequence homology search are presented as a compact tabular view that summarizes the key alignments statistics and as expanded detail view showing pairwise sequence alignments.
The compact tabular view lists the top hits matching the query sequence. It shows the function of the gene/protein hit, corresponding genome, subject length, percent identity, percent query coverage, percent subject coverage, BLAST score, and E value. These summary statistics allow you to quickly assess the quality of the BLAST hit. For each gene/protein hit, function is hyperlinked to corresponding Feature Landing Page, which provides detailed information about the feature. Similarly, the genome name is hyperlinked to corresponding Genome Landing Page, which provides further information about the genome.
The check boxes in the beginning of every row can be used to select search results and copy them to narrative. Please note that if the search is against the gene or protein database, then the objects being copied to narrative are genes or proteins as Features. If the search is against genomic sequence database, then the objects being copied to narrative are genomes.
You can view the detailed pairwise alignments by:
The key features of the pairwise alignment view include:
When the results are from the search against a non-redundant database, all identical hits are merged and only one representative hit is shown, instead of showing separate hits for every identical feature, with exactly the same score and alignment.
There is an expand/collapse button available next to the “Number of matches”. When clicked, it shows the list of identical genes or proteins. The protein function and genome names are hyperlinked to feature and genome landing pages respectively for detailed information.
When you are ready to transfer your search results to your Narrative to analyze them, select the desired Narrative using the “Select a Narrative” button in the top left of the search results page. Clicking this will display a list of Narratives that you own or have access to. Find the desired Narrative in the list and click the name to select it.
If you have not already done so, select the data you wish to transfer by ticking the checkbox to the left of each data object you want. (As you do this, the Selections count above the shopping cart icon will go up.)
Once you have clicked the checkboxes for all of the data objects you wish to transfer, locate the blue button under the image of the shopping cart and click it to transfer all of the selected data into your Narrative. (If you want to deselect all of the data you selected, click the red trash can button.)
After you have transferred search results into your Narrative, you can analyze this data in the Narrative Interface using various Apps and Methods. In your browser, locate the menu in the top left corner and click “Narrative” to access the Narrative Interface from the Search page.
There are various analyses that you can run from the Narrative Interface, with more being added frequently. For a full description of the Narrative Interface, see the Narrative Interface Guide.
There are many other ways to use search results as inputs to KBase analysis tools. Check back soon for more examples, or try experimenting! Keep in mind that the search interface is still in an early phase of development. Please see our Report an Issue page for information on submitting bug reports or questions.
After performing a search, you may want to see more information about a data object. Notice that in your search results, some of the columns contain information marked with blue text. This text links to the “Data Landing Page” for a selected data object. A Data Landing page provides a detailed summary of an object, with links that enable further data exploration. The number of KBase data types that have Data Landing pages is increasing rapidly.
Since Data Landing pages open in another tab in your web browser, you can move between search results and Data Landing pages simply by clicking your browser tabs.
Within the Genomes category, the Scientific Name links to the Genome Data Landing Page for each organism. In the table listing genomes from your “arabidopsis” query, click on Arabidopsis thaliana to access the Data Landing page for this organism. Note that it can take a while for all the data in the Data Landing page panels to load.
This screenshot captures only a few of the panels available on the Data Landing page for this genome. Note that each panel is labeled with “kb|g.3899,” the KBase ID for the Arabidopsis thaliana genome. All panels can be moved around the page, removed completely or collapsed by hovering your cursor over the upper right corner to reveal the close and collapse buttons. These options allow you to customize the layout of the Data Landing page you are viewing.
Some panels on this page, such as the Genome Overview, are populated using information from the object. Others contain additional organism information that KBase pulls from external sources like Wikipedia. Several panels in this view may be empty because they are meant to hold information created and owned by a user. For example, scroll down and locate the Taxonomy panel.
If signed in, you can launch a new Narrative from this panel to build a species tree for the Arabidopsis thaliana genome and run other KBase apps and methods on this data.
Additional panels on the Genome Data Landing page let you browse and explore information about the organism’s contigs and gene list (see image below of the Contig Browser). You can also view a Publications list that provides journal, author, date, and title information. Titles are linked to the corresponding PubMed abstract, which can be read on the Data Landing page itself by hovering over the title.
In addition to organism information, a Genome Data Landing page also has panels that track how you and others are using that genome in KBase. For example, notice the panels on the Arabidopsis thaliana page that show which Narratives the genome appears in, who in KBase has used the genome, and a list of data objects referencing the genome. These panels are included on every type of Data Landing page, not just Genomes.
Also, an Object Reference and Provenance Graph at the bottom of the page gives a visual representation of the history and activity of a data object in KBase. Planned for every Data Landing page, these visualizations are interactive, allowing you to hover over portions of the graph to see object details and provenance information and to adjust the view of the graph to center it around a selected object. The Arabidopsis thaliana graph is relatively simple, but the screenshot below shows the provenance of a data object (a Rhodobacter genome) that has been used in numerous KBase analysis steps.
Although the Genomes category contains links to only Genome Data Landing pages, other data categories may contain more than one Data Landing page link. For example, results in the Genome Features category have links to Data Landing pages for Features as well as Genomes.
To explore the Data Landing pages for Features, return to the web browser tab that has your search results. In the category navigation options on the left, choose “Return to All Categories” and then select “Genome Features.” Notice that, in addition to the Scientific Name column, the Feature ID column links to Data Landing pages too.
Click the first entry in the Feature ID column to view the Data Landing page for a Locus Feature. Notice that the set of panels displayed on the Locus Feature Data Landing page differs somewhat from the panels on the Genome Data Landing page. For example, the Feature Data Landing page has a Biochemistry panel in place of the Taxonomy panel that is found on Genome Data Landing pages.