Welcome to Metagenome Analysis 101¶

Student

Authors: Elisha Wood-Charlson, Jon Benskin, Carlos Goller, Ellen Dow

Audience¶

High School Students
Undergraduate Students
Graduate Students
Biology, Bioinformatics, Genetics, Genomics, Proteomics, CSS, etc

Learning goals¶

Evaluate read quality based on FastQC reports
Perform read trimming with Trimmomatic.
Explain in your own words how adapter contamination can
- a) affect read quality
- b) be addressed with Trimmomatic

Biological Topics and Concepts¶

Gene sequencing
Data File types
Quality Control of raw files

Activity Description¶

This Narrative is designed to import data for Metagenome analysis modules and perform initial quality checks on data.

Data Source¶

Where are these data from? Scientists dive deep to explore mysterious 'blue hole' on the Florida seabed. Link to article

Data Source: 106m depth, Collected May 2019 Patin, Nastassia; Stewart, Frank; Hall, Emily; Dietrich, Zoe; Beckler, Jordon (2020): Blue Hole Shotgun Metagenome: May 2019, 106 M. figshare. Dataset. https://doi.org/10.6084/m9.figshare.12644048.v1

Metagenome Modules

Background¶

Where are these data from? Scientists dive deep to explore mysterious 'blue hole' on the Florida seabed. Link to article

Data Source: 106m depth, Collected May 2019 Patin, Nastassia; Stewart, Frank; Hall, Emily; Dietrich, Zoe; Beckler, Jordon (2020): Blue Hole Shotgun Metagenome: May 2019, 106 M. figshare. Dataset. https://doi.org/10.6084/m9.figshare.12644048.v1

Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114-2120. doi:10.1093/bioinformatics/btu170

Videos:

https://youtu.be/pIlAMDg00AQ (10 min, auto captions)
https://youtu.be/Q4UU6k13090 (17 min, auto capt)
EMSL Summer School clip of Colorado State University Professor Kelly Wrighton giving an introduction to metagenomics data and analysis (46:00-1:29:00, auto capt)

Step 1) Upload FASTQ file of raw read(s) and inspect the data ¶

Click on the reads file in the Data Panel (top left) to add the data object to your Narrative (this panel). Inspect the data object by reviewing the Overview and Stats tabs.

Questions to answer:¶

Q1) How many reads are in your metagenome?

Q2) What is your mean read length?

Q3) What is the total sequencing size in gigabase pairs?

Q4) What is the GC percentage of the imported data?

Optional step if reads are not joined¶

If paired-end reads are imported as R1 and R2 (individual data sets), run FastQ-Join to combine into a single paired-end read data set before continuing.

Step 2) Assess Read Quality with FastQC ¶

Quality check of sequence reads to identify low quality reads. In the App Panel (bottom left), search for FastQC. Click on the App name to add the App to the Narrative below this Narrative cell. Select the reads file from the drop-down menu, and click Run to start the analysis.

KBase Tips¶

Starting an analysis will save your Narrative workspace and send the commands to KBase's compute resources to run the job. If you need to close the Narrative at this point, or after any analysis begins, the job continues to run. Results will update in the App cell below automatically, once the job is complete.

Note: If you edit the Narrative and do NOT run an analysis, you must manually save the Narrative workspace by clicking the save icon in the menu at the top.

Questions to answer:¶

Note: Each read pair has a separate report page. Use the Page 1 and Page 2 buttons to review both reports.

Q5) At what base pair does your average quality seem to start dropping (scroll through the FastQC panel on the right) for forward reads? Reverse?

Q6) Based on these results, how should you trim your data? Which data would you expect to be changed after trimming?

Step 3) Trim reads with Trimmomatic ¶

Trimmomatics removes low quality reads as well as adapter sequences. Find the App in the App Panel and add to the Narrative as in Step 2. Most of the advanced parameters can be run with default parameters, but how do you know what Adapter to select?

Advanced extension- Using the article here,¶

What is the theory behind trimming? Why is it important? Why is it difficult?

Step 4) Rerun FastQC ¶

Questions to answer:¶

Q7) What percentage of your total paired reads (forward and reverse) survived Trimmomatic?

Q8) What is the mean read length after trimming? How has it changed from before trimming?

Hint - Data Panel objects have additional details that are displayed when you hover over the object click on "..."

Q9) What has changed in our FastQC output since running trimmomatic? (scroll through the FastQC panel on the right)

Optional: Additional understanding of read quality - Adaptors

Why did many of the reverse reads not survive Trimmomatic? Review the original FastQC results. Try running Trimmomatic without selecting the Adaptor for trimming and compare. Remember that each one of these samples went through numerous preparation steps in the laboratory prior to sequencing. Sometimes samples and protocols do not always produce great data. It is important to do proper QC prior to running any data analyses to ensure you have quality data going into your analysis pipeline.

Welcome to Metagenome Analysis 101¶

Student

Audience¶

Learning goals¶

Biological Topics and Concepts¶

Activity Description¶

Data Source¶

Metagenome Modules

Background¶

Step 1) Upload FASTQ file of raw read(s) and inspect the data ¶

Questions to answer:¶

Optional step if reads are not joined¶

Step 2) Assess Read Quality with FastQC ¶

KBase Tips¶

Questions to answer:¶

Step 3) Trim reads with Trimmomatic ¶

Advanced extension- Using the article here,¶

Step 4) Rerun FastQC ¶

Questions to answer:¶

Optional: Additional understanding of read quality - Adaptors

Next up: Module 2 - Assembly ¶

Apps

Welcome to Metagenome Analysis 101¶

Student

Audience¶

Learning goals¶

Biological Topics and Concepts¶

Activity Description¶

Data Source¶

Metagenome Modules

Background¶

Step 1) Upload FASTQ file of raw read(s) and inspect the data ¶

Questions to answer:¶

Optional step if reads are not joined¶

Step 2) Assess Read Quality with FastQC ¶

KBase Tips¶

Questions to answer:¶

Step 3) Trim reads with Trimmomatic ¶

Advanced extension- Using the article here,¶

Step 4) Rerun FastQC ¶

Questions to answer:¶

Optional: Additional understanding of read quality - Adaptors

Next up: Module 2 - Assembly¶

Apps

Next up: Module 2 - Assembly ¶