Generated August 14, 2020

What is Delftia?

delftia%20cluster%20and%20protein.png Delftia is a genus of bacteria with a bunch of cool features! The best studied species, Delftia acidovorans can produce gold nanoparticles from gold ions in solution. This bacteria has been found living in biofolms with Cupriavidis metallidurans on gold nuggets(1). It is also found in soil, in sinks and in rhizospheres of different plants where it promotes their growth(3).

Delftia acidovorans forms gold nuggets by producing a short nonribosomal peptide called delftibactin (1). The 16 genes responsible for the production of delftibactin are called the del cluster (delA-delP) (1). These genes were originally discovered in Delftia acidovorans SPH-1, but our research shows that the del cluster appears to be present across the genus (1).

http://2013.igem.org/Team:Heidelberg/Project/Delftibactin

The Phylogenetic Classification of Delftia

Proteobacteria

Betaproteobacteria

Burkholderiales

Comamonadaceae

Delftia

How will we find Delftia?

Our Workflow:

Narrative%20Methods%20Figure.png

Each step will be outlined before you reach it with a description of the apps, parameters and results.

We'll start with a number of raw reads, clean them up and do a little taxonomy to figure out what we might be able to assemble. Next, we'll assemble the reads using 4 different methods and pick out the best assembly. Then we'll take those reads and sort them by genome and assess the quality of those genomes. We'll select the high quality genomes to annotate and insert into phylogenetic trees to find relatives.

Learning Objectives:

After completing this narrative, students will be able to:

  • Describe the steps of assembling metagenomic data into genomes and the importance of each step.
  • Define the following terms and processes, read trimming, contig(s), bin(s), binning, MAG, L50, and N50.
  • Interpret the results of FastQC and CheckM reports.
  • Explain why multiple assemblers are used.
  • Compare and contrast assembly statistics to determine the best assembly to generate MAGs.
  • Identify high quality bins to extract and annotate.

What samples am I searching for Delftia?

Sample 1: Metagenome generated from fracture fluid collected July 12-27, 2012 from a borehole located on the 26 level of Beatrix Gold Mine (Welkom, South Africa) 1,339 m below land surface.

Bio Sample: SAMN04419121; Sample name: Be326_2012_DNA_MF; SRA: SRS1792548 Link to more information: https://www.ncbi.nlm.nih.gov/biosample/SAMN04419121

Why did I pick this sample?

Delftia acidovorans and Cupriavidus metallidurans make up 90% of the bacteria found in biofilms on gold nuggets (1). This sample is from a gold mine, so it would be interesting to see if Delftia is also present here. Additionally, the Krona plot indicated that there were some (unassembled) sequences that resembled Delftia.

Sample 2: Hydraulically fractured gas well metagenomes fluid from sample at timepoint_82

Bio Sample: SAMN04417545; Sample name: Timepoint_82; SRA: SRS1256393 Link to more information: https://www.ncbi.nlm.nih.gov/biosample/SAMN04417545

Why did I pick this sample?

Delftia is often found in soil and water. It is capable of using a diverse array of carbon sources as well. (That's why so many studies focus on it's use for bioremediation.) In this case the Krona plot indicated a small portion of potential Delftia sequences.

Sample 3: Subsurface sediment microbial communities from gas well in Oklahoma, United States - OK STACK MC-FT3-sol metagenome

Bio Sample: SAMN09199659; SRA: SRS3667068; DOE Joint Genome Institute: Gp0290895 Link to more information: https://www.ncbi.nlm.nih.gov/biosample/SAMN09199659

Why did I pick this sample?

I stumbled upon this sample while I was searching through the database. The Krona plot showed a much greater proportion of genes that could potentially be from Delftia, and I couldn't skip it. Delftia is often found in soil and/or sediment.

Sample 4: Metagenome of iron plaque on rice root from As contaminated paddy soil, sample from Yanhong

Bio Sample: SAMN07211852; Sample name: YanhongMeta01; SRA: SRS2392319 Link to more information: https://www.ncbi.nlm.nih.gov/biosample/SAMN07211852

Why did I pick this sample?

I choose this sample because it brings together soil and aquatic environments that Delftia can be found in and is associated with heavy metals. The Krona plot indicated some Delftia-associated sequences are present from several species of Delftia.

Sample 5: Peat soil microbial communities from Stordalen Mire, Sweden - IR.F.S.T-25

Bio Sample: SAMN09201211; SRA: SRS3568559,DOE Joint Genome Institute: Gp0256443, Link to more information: https://www.ncbi.nlm.nih.gov/sra/SRX4415252[accn]

Why did I pick this sample?

I picked this sample because soil is one of the common sources of Delftia. The Krona plot looked promising as well for both Delftia acidovorans and Delftia tsuruhatensis.

Step 1. Import Metagenomic Data

The data I'm using comes from publicly available datasets available from NCBI's Sequence Read Archive (SRA).

The sample you need to upload should be a metagenomic sample listed as a WGS having paired reads. Your sample cannot be more than 20G bases or you'll need to import it through Globus (see this link for more information). You can also check out this really informative narrative for an in depth view of how to upload data from other sources: https://narrative.kbase.us/narrative/48493

App: Import SRA File as Reads from Web

Timing: 1-5 hours depending on the size of the file and queue time. (Its often helpful to run this in the morning so it uploads and then you can set up the assembly to run overnight or over the weekend.)

View Configure:

SRA URL: Use the first link from the "Reads Access" page and paste it into this block.

Reads Object Name: this will be what your sample is called once it's been uploaded into KBase. Make sure its something you'll be able to remember and follow the workflow with.

Sequencing Technology: Select the sequencing technology that was used to call the reads.

Single Genome: These reads are all metagenomes, so you don't need to select this box.

Results: Your reads should appear in the Data panel to the left. Before proceding, double check that they are a PairedEndLibrary. If they are a SingleEndLibrary, you'll need to select a different sample, because your assembly won't work with a SingleEndLibrary. In the results panel itself you'll see some stats from the reads including the number of reads, quality score mean and mean read length.

Import an SRA file from a web URL into your Narrative as a Reads data object.
This app completed without errors in 45m 20s.
Objects
Created Object Name Type Description
Subsurface_gold_mine_reads PairedEndLibrary Imported Reads
Links
Import an SRA file from a web URL into your Narrative as a Reads data object.
This app completed without errors in 53m 19s.
Objects
Created Object Name Type Description
Hydraulic_fracture_well_fluid_raw_reads PairedEndLibrary Imported Reads
Links
Import an SRA file from a web URL into your Narrative as a Reads data object.
This app completed without errors in 39m 34s.
Objects
Created Object Name Type Description
Subsurface_gas_well_reads PairedEndLibrary Imported Reads
Links
Import an SRA file from a web URL into your Narrative as a Reads data object.
This app completed without errors in 46m 56s.
Objects
Created Object Name Type Description
Rice_root_iron_plaque_reads PairedEndLibrary Imported Reads
Links
Import an SRA file from a web URL into your Narrative as a Reads data object.
This app completed without errors in 1h 22m 38s.
Objects
Created Object Name Type Description
Peat_soil_raw_reads PairedEndLibrary Imported Reads
Links

Step 2. Assess the Quality of your Samples

You've uploaded your data, great! Now you have to check the quality of the reads. First assess quality with FastQC. Then, if you need to, trim the reads and assess quality again to make sure the low quality reads have been removed from the sample.

Apps:

  1. Assess Read Quality with FastQC
  2. Trim Reads with Trimmomatic

Timing: 5-20 minutes depending on queue and the number of reads

Assess Read Quality with FastQC View Configure:

  • All you need to imput to assess quality is the name of your reads library

Results: This app will give you a full report of the quality of your reads. The report will have two pages, for the forward and reverse reads in the PairedEndLibrary. Here we'll be focusing on the Per Base Sequence Quality, but the rest of the report offers a bunch of useful information about our library.

  1. Per Base Sequence Quality: An overview of the quality scores for each position in the read. The red line is the median, blue the mean, the yellow bars the interquartile range and the black lines the 10% and 90% points.
  2. Per Sequence Quality Scores: This shows the distribution of quality scores across sequences. Ideally you will only see a single peak on the far right. If you see a second peak to the center or left, you may have a subset of low quality reads that need to be trimmed out.
  3. Per Base Sequence Content: This shows the proportion of each base present at each position in your reads. In a perfectly random run, you would expect to see parallel lines across this chart, but the relative amount of each base would depend on the genome(s) sequenced.
  4. Per Sequence GC Content: In a random library, you would expect this to fall across a normal distribution with a peak over the GC content of the chromosome. An unusually shaped distribution can indicate contamination or systemic error.
  5. Per Base N Content: Whenever a base is unable to be called, the sequencer labels it as N for any nucleotide. This chart shows the percent of bases at each postition that have been called as "N". It's not unusual to see some of these at the end of sequences when the quality drops, but a consistently high proportion or a peak in the start or middle of the sequence may indicate a problem in the analysis.
  6. Sequence Length Distribution: This shows the distribution of read sizes.
  7. Sequence Duplication Levels: This plot shows the relative number of sequences with each degree of duplication. It's based only on a subset of the data, but you should get a good idea of how many sequences have duplicates. If there are a number in the 2 or more, you may need to trim the library to remove them.
  8. Overrepresented Sequences: If any sequences represent more than 0.1% of the total sequences, they will be listed here. An abundance of any sequence may indicate contamination, a lack of diversity in the library or is biologically significant.
  9. Adapter Content: This graph shows where any adapters are found in your library.
  10. Kmer Content: If any Kmers are overrepresented in your library, they will be listed here.

For a more in depth look at understanding your FastQC report, check out the manual here: https://dnacore.missouri.edu/PDF/FastQC_Manual.pdf

Trim Reads with Trimmommatic View Configure

  • Read library or set: Select the library you want to trim.

  • Parameters: This section covers different trimming parameters specific to removing adapters, croping the sequences, and the quality thresholds required to trim a read. In this workflow, I'm going to leave them all as defaults, but you can learn more about them in the App Info page and in the KBase App catalog.

  • Output library name: Make sure to specify that this library has been trimmed so you can tell the two libraries apart later.

Once you've trimmed the read library, reassess the quality of the PairedEndLibrary using FastQC.

Results: A FastQC Report that details the quality of the reads you've submitted and possibly trimmed libraries.

A quality control application for high throughput sequence data.
This app completed without errors in 22m 53s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • Subsurface_gold_mine_reads_67335_2_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • Subsurface_gold_mine_reads_67335_2_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
A quality control application for high throughput sequence data.
This app completed without errors in 31m 59s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • Hydraulic_fracture_well_fluid_raw_reads_67335_4_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • Hydraulic_fracture_well_fluid_raw_reads_67335_4_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
A quality control application for high throughput sequence data.
This app completed without errors in 19m 56s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • Subsurface_gas_well_reads_67335_6_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • Subsurface_gas_well_reads_67335_6_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
A quality control application for high throughput sequence data.
This app completed without errors in 33m 12s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • Rice_root_iron_plaque_reads_67335_8_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • Rice_root_iron_plaque_reads_67335_8_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
A quality control application for high throughput sequence data.
This app completed without errors in 20m 11s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • Peat_soil_raw_reads_67335_20_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • Peat_soil_raw_reads_67335_20_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report

Q1 Looking at the results of the FastQC app, which reads set(s) would you choose to trim and re-assess the quality of? Why did you pick that/those set(s)?

Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 2h 11m 54s.
Objects
Created Object Name Type Description
subsurface_gold_mine_reads_trimmed_paired PairedEndLibrary Trimmed Reads
subsurface_gold_mine_reads_trimmed_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
subsurface_gold_mine_reads_trimmed_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
A quality control application for high throughput sequence data.
This app completed without errors in 21m 48s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • subsurface_gold_mine_reads_trimmed_paired_67335_14_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • subsurface_gold_mine_reads_trimmed_paired_67335_14_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
Trim paired- or single-end Illumina reads with Trimmomatic.
This app completed without errors in 38m 12s.
Objects
Created Object Name Type Description
Peat_soil_reads_trimmed_paired PairedEndLibrary Trimmed Reads
Peat_soil_reads_trimmed_unpaired_fwd SingleEndLibrary Trimmed Unpaired Forward Reads
Peat_soil_reads_trimmed_unpaired_rev SingleEndLibrary Trimmed Unpaired Reverse Reads
A quality control application for high throughput sequence data.
This app completed without errors in 20m 20s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • Peat_soil_reads_trimmed_paired_67335_33_1.fwd_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report
  • Peat_soil_reads_trimmed_paired_67335_33_1.rev_fastqc.zip - Zip file generated by fastqc that contains original images seen in the report

Step 3. Taxonomy Before Assembly

Before we assemble these libraries, it will be helpful to get an idea of what's present in our samples and at what abundance. KBase has two apps to do this, Kaiju and GOTTCHA2.

Apps:

  1. Classify Taxonomy of Metagenomic Reads with Kaiju
  2. Classify Taxonomy of Metagenomic Reads with GOTTCHA2

Timing: 20 mins-2 hours depending on queue and number of reads you're running

1. Classify Taxonomy of Metagenomic Reads with Kaiju

This app translates reads into proteins and uses those sequences to identify what's present or possibly present in the sample.

View Configure:

  • Read Library or Set: You can either run the app for each library or run it once with all read libraries.
  • Taxonomic Level: By default the app will show all levels from phylum to species.
  • Reference DB: Here you'll select the database to compare your reads against. Either RefSeq or BLAST are fine for our purposes and you don't need to include eukaryotes, since we're just looking for a bacteria.
  • Low Abundance Filter: This value filters out taxa that occur infrequently in your sample and helps simplify your results. After all, if something is present in low abundance, you're unlikely to be able to produce an assembled genome from a small amount of reads. The default is fine, but if you want to look at what is present in low abundance you can lower it.
  • Subsample Percent, Subsample Replicates, and Subsample Seed: To save time Kaiju only looks at a subsample of your reads. These settings allow you to adjust the percentage of your sample and replicates that Kaiju uses. In a truly random sample, there should be no variation between them, and this is generally true and can be seen below in the individual runs.
  • Filter Low Complexity: This should be set to filter so the algorithm only takes unique protein sequences into account.
  • Allow Imperfect Matches? Min Match Length, Allow Imperfect Matches, Greedy Max Mismatches, Greedy Min Bit Score, Greedy Max E-Value: These settings all relate to how much you want the algorithm to tolerate mismatches in protein sequences. How specific you want it to be is up to you, in this case we don't need it to be extremely specific since we only want a quick overview of what's present.
  • Sort Plots By: How you sort the plots is up to you, mine are sorted by total abundance, but you can also sort them alphabetically.

Results: Your results will be a series of tables showing the breakdown of your sample beginning with the phyla and ending at species. The tail includes everything that is present below the low abundance filter.

2. Classify Taxonomy of Metagenomic Reads with GOTTCHA2

Unlike Kaiju, this app shows relative abundance based on unique nucleotide sequences from RefSeq.

View Configure:

  • Read Library/Set: Add your reads library here.
  • Reference DB: You can either select the bacterial/viral/archaeal database or fungal database. In this case, we'll be using the first option.
  • Minimum Coverage: This is the minimum percentage of the unique genome signatures identified to be considered in the abundance calculation. Decreasing it will include lower abundance species, while increasing it will remove them.
  • Minimum Reads: The minimum number of reads to be included in the abundance calculation.
  • Minimum Length: The minimum length of reads for them to be included in the abundance calculation. If your library contains many short reads, you may want to reduce this value, but be aware that reducing it also increases the chance that you will get an incorrect result.
  • Maximum Zscore: This is based on the estimated Zscore. The default is fine for our Delftia search.

Results: There are three ways you can view the results from GOTTCHA2. The first is as a table showing the classification of your reads, some statistics regarding their abundance and their relative abundance. The second is as a phylogenetic tree showing the relationships of the different taxa identified in your sample. The third layout is as a Krona plot, an interactive plot that displays relative abundance and phylogenetic relationships. Clicking on a phylum will zoom in to show the classes within it. How far you can zoom down depends on the sample and the unique sequences in it.

These first two runs of Kaiju include the first 4 libraries run as one first with NCBI BLAST as the database, then with RefSeq as the database. The following 4 examples show the libraries individually with different settings to increase the proportion of the library represented.

Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This app completed without errors in 2h 56m 47s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip
Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This app completed without errors in 2h 46m 12s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip

These runs of Kaiju are broken down by sample and use the NCBI BLAST database. They are run on two subsamples.

Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This app completed without errors in 54m 12s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip

Q2 Do the two subsamples vary much? Is this expected or unexpected and why?

Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This app completed without errors in 57m 10s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip
Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This app completed without errors in 49m 30s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip
Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This app completed without errors in 54m 46s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip
Allows users to perform taxonomic classification of shotgun metagenomic read data with Kaiju.
This app completed without errors in 1h 21m 11s.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • kaiju_classifications.zip
  • kaiju_summaries.zip
  • krona_data.zip
  • stacked_bar_abundance_plots_PNG+PDF.zip

This first run of GOTTCHA2 includes ALL read libraries as one sample. The 4 runs afterwards show the samples individually.

Uses GOTTCHA2 to provide taxonomic classifications of shotgun metagenomic reads data.
This app completed without errors in 2h 3m 1s.
Summary
GOTTCHA2 run finished on d0e40104-e7db-43b8-8463-8109aa0443da.inter.fastq.gz,617c1e51-fae4-4a3f-9081-45cd1b3f3189.inter.fastq.gz,3e51f290-7b31-4053-bf46-2f6ca081f0a3.inter.fastq.gz,561134de-8627-4cd0-ac5c-151284f468ec.inter.fastq.gz against RefSeq-r90.cg.BacteriaArchaeaViruses.species.fna.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • gottcha2.out.list
  • gottcha2.gottcha_species.sam
  • gottcha2.tsv
  • html_report
  • gottcha2.out.tab_tree
  • gottcha2.gottcha_species.log
  • gottcha2.summary.tsv
  • gottcha2.krona.html
  • gottcha2.full.tsv
  • gottcha2.lineage.tsv
Uses GOTTCHA2 to provide taxonomic classifications of shotgun metagenomic reads data.
This app completed without errors in 58m 45s.
Summary
GOTTCHA2 run finished on 617c1e51-fae4-4a3f-9081-45cd1b3f3189.inter.fastq.gz against RefSeq-r90.cg.BacteriaArchaeaViruses.species.fna.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • gottcha2.out.tab_tree
  • gottcha2.gottcha_species.sam
  • gottcha2.gottcha_species.log
  • gottcha2.full.tsv
  • gottcha2.tsv
  • gottcha2.summary.tsv
  • gottcha2.lineage.tsv
  • gottcha2.krona.html
  • html_report
  • gottcha2.out.list

Q3: Open the Krona plot from this sample. Viruses make up what percent of the sample?

Uses GOTTCHA2 to provide taxonomic classifications of shotgun metagenomic reads data.
This app completed without errors in 41m 1s.
Summary
GOTTCHA2 run finished on 3e51f290-7b31-4053-bf46-2f6ca081f0a3.inter.fastq.gz against RefSeq-r90.cg.BacteriaArchaeaViruses.species.fna.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • gottcha2.lineage.tsv
  • html_report
  • gottcha2.out.tab_tree
  • gottcha2.full.tsv
  • gottcha2.gottcha_species.log
  • gottcha2.out.list
  • gottcha2.summary.tsv
  • gottcha2.krona.html
  • gottcha2.tsv
  • gottcha2.gottcha_species.sam
Uses GOTTCHA2 to provide taxonomic classifications of shotgun metagenomic reads data.
This app completed without errors in 26m 35s.
Summary
GOTTCHA2 run finished on d0e40104-e7db-43b8-8463-8109aa0443da.inter.fastq.gz against RefSeq-r90.cg.BacteriaArchaeaViruses.species.fna.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • gottcha2.tsv
  • gottcha2.gottcha_species.log
  • gottcha2.gottcha_species.sam
  • gottcha2.full.tsv
  • gottcha2.lineage.tsv
  • gottcha2.summary.tsv
  • gottcha2.out.list
  • gottcha2.out.tab_tree
  • gottcha2.krona.html
  • html_report

Q4: What is the relative abundance of Betaproteobacteria in this sample?

Uses GOTTCHA2 to provide taxonomic classifications of shotgun metagenomic reads data.
This app completed without errors in 41m 20s.
Summary
GOTTCHA2 run finished on 561134de-8627-4cd0-ac5c-151284f468ec.inter.fastq.gz against RefSeq-r90.cg.BacteriaArchaeaViruses.species.fna.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • gottcha2.gottcha_species.sam
  • gottcha2.gottcha_species.log
  • gottcha2.out.list
  • html_report
  • gottcha2.full.tsv
  • gottcha2.summary.tsv
  • gottcha2.tsv
  • gottcha2.out.tab_tree
  • gottcha2.lineage.tsv
  • gottcha2.krona.html
Uses GOTTCHA2 to provide taxonomic classifications of shotgun metagenomic reads data.
This app completed without errors in 21m 28s.
Summary
GOTTCHA2 run finished on f00edf09-9f80-4430-9bb1-917ac99aa5ca.inter.fastq.gz against RefSeq-r90.cg.BacteriaArchaeaViruses.species.fna.
Links
Files
These are only available in the live Narrative: https://narrative.kbase.us/narrative/67335
  • gottcha2.gottcha_species.sam
  • html_report
  • gottcha2.tsv
  • gottcha2.out.tab_tree
  • gottcha2.out.list
  • gottcha2.summary.tsv
  • gottcha2.krona.html
  • gottcha2.full.tsv
  • gottcha2.gottcha_species.log
  • gottcha2.lineage.tsv

Q5: Which sample(s) look the most promising based on the taxonomy results from GOTTCHA2?

Step 4. Assembly of Metagenomic Data

Alright, you've made it this far. This step takes the longest, so you may want to set it up to run overnight or over the weekend. This step will take our read libraries and line them up into longer sequences called contigs. Later we'll sort these contigs based on what genomes they came from. We'll be using three different apps to generate 4 different sets of contigs.

Apps:

  1. Assemble Reads with MetaSPAdes
  2. Assemble Reads with MEGAHIT (we'll run this app twice)
  3. Assemble Reads with IDBA-UD

Timing: hours to days (One of my assemblies below ran for almsot 4 days.)

1. Assemble Reads with MetaSPAdes

View