Authors: Elisha Wood-Charlson, Jon Benskin, Carlos Goller, Ellen Dow
This Narrative is designed to import data for Metagenome analysis modules and perform initial quality checks on data.
Where are these data from? Scientists dive deep to explore mysterious 'blue hole' on the Florida seabed. Link to article
Data Source: 106m depth, Collected May 2019 Patin, Nastassia; Stewart, Frank; Hall, Emily; Dietrich, Zoe; Beckler, Jordon (2020): Blue Hole Shotgun Metagenome: May 2019, 106 M. figshare. Dataset. https://doi.org/10.6084/m9.figshare.12644048.v1
Where are these data from? Scientists dive deep to explore mysterious 'blue hole' on the Florida seabed. Link to article
Data Source: 106m depth, Collected May 2019 Patin, Nastassia; Stewart, Frank; Hall, Emily; Dietrich, Zoe; Beckler, Jordon (2020): Blue Hole Shotgun Metagenome: May 2019, 106 M. figshare. Dataset. https://doi.org/10.6084/m9.figshare.12644048.v1
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114-2120. doi:10.1093/bioinformatics/btu170
Videos:
Click on the reads file in the Data Panel (top left) to add the data object to your Narrative (this panel). Inspect the data object by reviewing the Overview and Stats tabs.
Q1) How many reads are in your metagenome?
Q2) What is your mean read length?
Q3) What is the total sequencing size in gigabase pairs?
Q4) What is the GC percentage of the imported data?
If paired-end reads are imported as R1 and R2 (individual data sets), run FastQ-Join to combine into a single paired-end read data set before continuing.
Quality check of sequence reads to identify low quality reads. In the App Panel (bottom left), search for FastQC. Click on the App name to add the App to the Narrative below this Narrative cell. Select the reads file from the drop-down menu, and click Run to start the analysis.
Starting an analysis will save your Narrative workspace and send the commands to KBase's compute resources to run the job. If you need to close the Narrative at this point, or after any analysis begins, the job continues to run. Results will update in the App cell below automatically, once the job is complete.
Note: If you edit the Narrative and do NOT run an analysis, you must manually save the Narrative workspace by clicking the save icon in the menu at the top.
Note: Each read pair has a separate report page. Use the Page 1 and Page 2 buttons to review both reports.
Q5) At what base pair does your average quality seem to start dropping (scroll through the FastQC panel on the right) for forward reads? Reverse?
Q6) Based on these results, how should you trim your data? Which data would you expect to be changed after trimming?
Trimmomatics removes low quality reads as well as adapter sequences. Find the App in the App Panel and add to the Narrative as in Step 2. Most of the advanced parameters can be run with default parameters, but how do you know what Adapter to select?
Q7) What percentage of your total paired reads (forward and reverse) survived Trimmomatic?
Q8) What is the mean read length after trimming? How has it changed from before trimming?
Hint - Data Panel objects have additional details that are displayed when you hover over the object click on "..."
Q9) What has changed in our FastQC output since running trimmomatic? (scroll through the FastQC panel on the right)
Why did many of the reverse reads not survive Trimmomatic? Review the original FastQC results. Try running Trimmomatic without selecting the Adaptor for trimming and compare. Remember that each one of these samples went through numerous preparation steps in the laboratory prior to sequencing. Sometimes samples and protocols do not always produce great data. It is important to do proper QC prior to running any data analyses to ensure you have quality data going into your analysis pipeline.