In KBase, a Genome is a sequence file that includes feature calls, also known as annotations. Genomes can be used as input for several KBase analyses.

Currently, the Genome importer supports only GenBank format. The GenBank input file should include the sequence contig(s) and also the feature calls (annotations), as well as the taxonomy information for the organism. KBase parses the GenBank file into two data objects: an assembly and a genome object containing the original feature calls and annotations.

There are two ways a GenBank file can be loaded into KBase:

  1. From a GenBank (.gb or .gbk) file on your local computer: When uploading genome from GenBank files on your computer, please ensure that your filename ends with either the .gb or .gbk file extension as KBase may not support non-standard file extensions.
  2. Import directly from FTP/HTTP: This imports a GenBank file from a FTP or HTTP URL you specify, instead of uploading from your own computer.

For this example, we will use the E. coli K-12 MG1655 genome GenBank file from NCBI. By clicking on the following link you can download the E. coli K-12 MG1655 genome to your computer:

ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.gbk

Upload genome from a GenBank formatted file

  • Choose Genome from the data type dropdown menu
  • Click the Next button
  • Make sure that the UPLOAD GENBANK FILE tab is selected
  • Select your GenBank (.gb or .gbk) file from a directory on your computer
  • Provide a name for the Genome in the Genome Object ID field
  • Select the source of the GenBank file: RefSeq, Ensembl, or Other
  • Click the Import button
  • After the import process has completed, the resulting Genome and Assembly data objects will appear in your Data Panel at left

Import a GenBank file from FTP

You can also import a Genome into KBase by copying a GenBank FTP or HTTP link directly into the FTP importer–for example:

ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.gbk

  • Choose Genome from the data type dropdown menu
  • Click the Next button
  • Select the IMPORT FROM FTP tab
  • Copy the FTP link into the FTP File field
  • Provide a name for the Genome data object
  • Select the source of the GenBank file: RefSeq, Ensembl, or Other
  • Click the Import button

Close the window and notice that both a Genome and a Assembly object have been created and appear in your Data Panel at left: