“Assembly” is the KBase data type for assembled, unannotated DNA sequence contigs. If you want to upload annotated sequences in GenBank or GFF format, please see the Genome page.

An assembly file is a single file containing one or more contiguous DNA sequences in FASTA format. It can be uploaded to KBase from your local computer (with file extension .fasta, .fna, .fa, or .fas) or directly from an FTP or HTTP URL.

Importing a FASTA formatted assembly file from your computer

For this example, we will use an Escherichia coli K12 MG1655 assembly file from NCBI as the source: GCF_000005845.2_ASM584v2_genomic.fna.gz

Download that file to your computer. Then open the new Import tab in the Data Slideout and drag the assembly file into your Staging area.

Open the pulldown menu to the right of the filename in your Staging Area and select “Assembly”:

Now click the import icon to the right of “Assembly”. The data slideout will close and an app called “Import FASTA File as Assembly from Staging Area” will be added to your Narrative.

Notice that the name of the gzipped Assembly file is already filled in, as is a suggested name for the Assembly data object that will be created by the import (you can change that if you like). Adjust the minimum contig length if needed, then click the green Run button to start the import. When the import is finished, your Data Panel will update to show the new Assembly object, and a report will appear in the import app cell.

Compressed/zipped files

The Assembly import app can gzipped (.gz) FASTA input file. However, .zip files and .Z files are not yet supported by the importers (we are working on adding that). You can upload a zip file to your Staging Area, but then you should use the “uncompress” button to its left (the one with the diagonal arrows) to unzip it before trying to import it.