Delftia is a genus of bacteria with a bunch of cool features! The best studied species, Delftia acidovorans can produce gold nanoparticles from gold ions in solution. This bacteria has been found living in biofolms with Cupriavidis metallidurans on gold nuggets(1). It is also found in soil, in sinks and in rhizospheres of different plants where it promotes their growth(3).
Delftia acidovorans forms gold nuggets by producing a short nonribosomal peptide called delftibactin (1). The 16 genes responsible for the production of delftibactin are called the del cluster (delA-delP) (1). These genes were originally discovered in Delftia acidovorans SPH-1, but our research shows that the del cluster appears to be present across the genus (1).
http://2013.igem.org/Team:Heidelberg/Project/Delftibactin
Proteobacteria
Betaproteobacteria
Burkholderiales
Comamonadaceae
Delftia
Our Workflow:
Each step will be outlined before you reach it with a description of the apps, parameters and results.
We'll start with a number of raw reads, clean them up and do a little taxonomy to figure out what we might be able to assemble. Next, we'll assemble the reads using 4 different methods and pick out the best assembly. Then we'll take those reads and sort them by genome and assess the quality of those genomes. We'll select the high quality genomes to annotate and insert into phylogenetic trees to find relatives.
After completing this narrative, students will be able to:
Sample 1: Metagenome generated from fracture fluid collected July 12-27, 2012 from a borehole located on the 26 level of Beatrix Gold Mine (Welkom, South Africa) 1,339 m below land surface.
Bio Sample: SAMN04419121; Sample name: Be326_2012_DNA_MF; SRA: SRS1792548 Link to more information: https://www.ncbi.nlm.nih.gov/biosample/SAMN04419121
Why did I pick this sample?
Delftia acidovorans and Cupriavidus metallidurans make up 90% of the bacteria found in biofilms on gold nuggets (1). This sample is from a gold mine, so it would be interesting to see if Delftia is also present here. Additionally, the Krona plot indicated that there were some (unassembled) sequences that resembled Delftia.
Sample 2: Hydraulically fractured gas well metagenomes fluid from sample at timepoint_82
Bio Sample: SAMN04417545; Sample name: Timepoint_82; SRA: SRS1256393 Link to more information: https://www.ncbi.nlm.nih.gov/biosample/SAMN04417545
Why did I pick this sample?
Delftia is often found in soil and water. It is capable of using a diverse array of carbon sources as well. (That's why so many studies focus on it's use for bioremediation.) In this case the Krona plot indicated a small portion of potential Delftia sequences.
Sample 3: Subsurface sediment microbial communities from gas well in Oklahoma, United States - OK STACK MC-FT3-sol metagenome
Bio Sample: SAMN09199659; SRA: SRS3667068; DOE Joint Genome Institute: Gp0290895 Link to more information: https://www.ncbi.nlm.nih.gov/biosample/SAMN09199659
Why did I pick this sample?
I stumbled upon this sample while I was searching through the database. The Krona plot showed a much greater proportion of genes that could potentially be from Delftia, and I couldn't skip it. Delftia is often found in soil and/or sediment.
Sample 4: Metagenome of iron plaque on rice root from As contaminated paddy soil, sample from Yanhong
Bio Sample: SAMN07211852; Sample name: YanhongMeta01; SRA: SRS2392319 Link to more information: https://www.ncbi.nlm.nih.gov/biosample/SAMN07211852
Why did I pick this sample?
I choose this sample because it brings together soil and aquatic environments that Delftia can be found in and is associated with heavy metals. The Krona plot indicated some Delftia-associated sequences are present from several species of Delftia.
Sample 5: Peat soil microbial communities from Stordalen Mire, Sweden - IR.F.S.T-25
Bio Sample: SAMN09201211; SRA: SRS3568559,DOE Joint Genome Institute: Gp0256443, Link to more information: https://www.ncbi.nlm.nih.gov/sra/SRX4415252[accn]
Why did I pick this sample?
I picked this sample because soil is one of the common sources of Delftia. The Krona plot looked promising as well for both Delftia acidovorans and Delftia tsuruhatensis.
The data I'm using comes from publicly available datasets available from NCBI's Sequence Read Archive (SRA).
The sample you need to upload should be a metagenomic sample listed as a WGS having paired reads. Your sample cannot be more than 20G bases or you'll need to import it through Globus (see this link for more information). You can also check out this really informative narrative for an in depth view of how to upload data from other sources: https://narrative.kbase.us/narrative/48493
App: Import SRA File as Reads from Web
Timing: 1-5 hours depending on the size of the file and queue time. (Its often helpful to run this in the morning so it uploads and then you can set up the assembly to run overnight or over the weekend.)
View Configure:
SRA URL: Use the first link from the "Reads Access" page and paste it into this block.
Reads Object Name: this will be what your sample is called once it's been uploaded into KBase. Make sure its something you'll be able to remember and follow the workflow with.
Sequencing Technology: Select the sequencing technology that was used to call the reads.
Single Genome: These reads are all metagenomes, so you don't need to select this box.
Results: Your reads should appear in the Data panel to the left. Before proceding, double check that they are a PairedEndLibrary. If they are a SingleEndLibrary, you'll need to select a different sample, because your assembly won't work with a SingleEndLibrary. In the results panel itself you'll see some stats from the reads including the number of reads, quality score mean and mean read length.
Created Object Name | Type | Description |
---|---|---|
Subsurface_gold_mine_reads | PairedEndLibrary | Imported Reads |
Created Object Name | Type | Description |
---|---|---|
Hydraulic_fracture_well_fluid_raw_reads | PairedEndLibrary | Imported Reads |
Created Object Name | Type | Description |
---|---|---|
Subsurface_gas_well_reads | PairedEndLibrary | Imported Reads |
Created Object Name | Type | Description |
---|---|---|
Rice_root_iron_plaque_reads | PairedEndLibrary | Imported Reads |
Created Object Name | Type | Description |
---|---|---|
Peat_soil_raw_reads | PairedEndLibrary | Imported Reads |
You've uploaded your data, great! Now you have to check the quality of the reads. First assess quality with FastQC. Then, if you need to, trim the reads and assess quality again to make sure the low quality reads have been removed from the sample.
Apps:
Timing: 5-20 minutes depending on queue and the number of reads
Assess Read Quality with FastQC View Configure:
Results: This app will give you a full report of the quality of your reads. The report will have two pages, for the forward and reverse reads in the PairedEndLibrary. Here we'll be focusing on the Per Base Sequence Quality, but the rest of the report offers a bunch of useful information about our library.
For a more in depth look at understanding your FastQC report, check out the manual here: https://dnacore.missouri.edu/PDF/FastQC_Manual.pdf
Trim Reads with Trimmommatic View Configure
Read library or set: Select the library you want to trim.
Parameters: This section covers different trimming parameters specific to removing adapters, croping the sequences, and the quality thresholds required to trim a read. In this workflow, I'm going to leave them all as defaults, but you can learn more about them in the App Info page and in the KBase App catalog.
Output library name: Make sure to specify that this library has been trimmed so you can tell the two libraries apart later.
Once you've trimmed the read library, reassess the quality of the PairedEndLibrary using FastQC.
Results: A FastQC Report that details the quality of the reads you've submitted and possibly trimmed libraries.
Q1 Looking at the results of the FastQC app, which reads set(s) would you choose to trim and re-assess the quality of? Why did you pick that/those set(s)?
Created Object Name | Type | Description |
---|---|---|
subsurface_gold_mine_reads_trimmed_paired | PairedEndLibrary | Trimmed Reads |
subsurface_gold_mine_reads_trimmed_unpaired_fwd | SingleEndLibrary | Trimmed Unpaired Forward Reads |
subsurface_gold_mine_reads_trimmed_unpaired_rev | SingleEndLibrary | Trimmed Unpaired Reverse Reads |
Created Object Name | Type | Description |
---|---|---|
Peat_soil_reads_trimmed_paired | PairedEndLibrary | Trimmed Reads |
Peat_soil_reads_trimmed_unpaired_fwd | SingleEndLibrary | Trimmed Unpaired Forward Reads |
Peat_soil_reads_trimmed_unpaired_rev | SingleEndLibrary | Trimmed Unpaired Reverse Reads |
Before we assemble these libraries, it will be helpful to get an idea of what's present in our samples and at what abundance. KBase has two apps to do this, Kaiju and GOTTCHA2.
Apps:
Timing: 20 mins-2 hours depending on queue and number of reads you're running
1. Classify Taxonomy of Metagenomic Reads with Kaiju
This app translates reads into proteins and uses those sequences to identify what's present or possibly present in the sample.
View Configure:
Results: Your results will be a series of tables showing the breakdown of your sample beginning with the phyla and ending at species. The tail includes everything that is present below the low abundance filter.
2. Classify Taxonomy of Metagenomic Reads with GOTTCHA2
Unlike Kaiju, this app shows relative abundance based on unique nucleotide sequences from RefSeq.
View Configure:
Results: There are three ways you can view the results from GOTTCHA2. The first is as a table showing the classification of your reads, some statistics regarding their abundance and their relative abundance. The second is as a phylogenetic tree showing the relationships of the different taxa identified in your sample. The third layout is as a Krona plot, an interactive plot that displays relative abundance and phylogenetic relationships. Clicking on a phylum will zoom in to show the classes within it. How far you can zoom down depends on the sample and the unique sequences in it.
These first two runs of Kaiju include the first 4 libraries run as one first with NCBI BLAST as the database, then with RefSeq as the database. The following 4 examples show the libraries individually with different settings to increase the proportion of the library represented.
subsurface_gold_mine_reads_trimmed_paired |
Rice_root_iron_plaque_reads |
Subsurface_gas_well_reads |
Hydraulic_fracture_well_fluid_raw_reads |
ALL |
subsurface_gold_mine_reads_trimmed_paired |
Rice_root_iron_plaque_reads |
Subsurface_gas_well_reads |
Hydraulic_fracture_well_fluid_raw_reads |
ALL |
These runs of Kaiju are broken down by sample and use the NCBI BLAST database. They are run on two subsamples.
subsurface_gold_mine_reads_trimmed_paired |
ALL |
Q2 Do the two subsamples vary much? Is this expected or unexpected and why?
Hydraulic_fracture_well_fluid_raw_reads |
ALL |
Subsurface_gas_well_reads |
ALL |
Rice_root_iron_plaque_reads |
ALL |
Peat_soil_reads_trimmed_paired |
ALL |
This first run of GOTTCHA2 includes ALL read libraries as one sample. The 4 runs afterwards show the samples individually.
subsurface_gold_mine_reads_trimmed_paired |
Rice_root_iron_plaque_reads |
Subsurface_gas_well_reads |
Hydraulic_fracture_well_fluid_raw_reads |
subsurface_gold_mine_reads_trimmed_paired |
Q3: Open the Krona plot from this sample. Viruses make up what percent of the sample?
Hydraulic_fracture_well_fluid_raw_reads |
Subsurface_gas_well_reads |
Q4: What is the relative abundance of Betaproteobacteria in this sample?
Rice_root_iron_plaque_reads |
Peat_soil_reads_trimmed_paired |
Q5: Which sample(s) look the most promising based on the taxonomy results from GOTTCHA2?
Alright, you've made it this far. This step takes the longest, so you may want to set it up to run overnight or over the weekend. This step will take our read libraries and line them up into longer sequences called contigs. Later we'll sort these contigs based on what genomes they came from. We'll be using three different apps to generate 4 different sets of contigs.
Apps:
Timing: hours to days (One of my assemblies below ran for almsot 4 days.)
1. Assemble Reads with MetaSPAdes
View