MultiQC output shows that data processing steps are ineffective
3
0
Entering edit mode
2.1 years ago

I performed MultiQC to check the quality of reads before and after data processing. The MultiQC reports did not show significant improvement. May I know why?

Pre-processing status check

First I trimmed the adapters using BBDuk.

for f in `ls -1 *_1.fastq | sed 's/_1.fastq//'`;
do bbduk.sh -Xmx20g in1=$f\_1.fastq in2=$f\_2.fastq out1=../clean_data/$f\_1.fq out2=../clean_data/$f\_2.fq ref=../adapters.fa ktrim=r k=25 mink=10 ftm=5 tbo tpe;
Done

Second, I performed quality trimming:

for f in `ls -1 *_1.fq.gz | sed 's/_1.fq.gz//'`;
do bbduk.sh -Xmx20g in1=$f\_1.fq.gz in2=$f\_2.fq.gz out1=../trimmed_data/$f\_1.fq out2=../trimmed_data/$f\_2.fq qtrim=r trimq=10 maq=10;
Done

Third, I performed error correction using Musket:

for f in `ls -1 *.fq.gz | sed 's/.fq.gz//'`;
do ./../../musket-1.1/musket -k 21 536879812 -p 20 -zlib 9 -o ../corrected_data/$f\.fq.gz $f\.fq.gz;
Done
for f in `ls -1 *.fq.gz`;

Post-processing Status check

FastQC trimming • 834 views
ADD COMMENT
3
Entering edit mode
2.1 years ago
ATpoint 82k

Adapters are gone, and that is the only really relevant metric in fastqc unless you would see a bad per-base quality indicating sequencing failure.

ADD COMMENT
3
Entering edit mode
2.1 years ago
Mensur Dlakic ★ 27k

There is clear improvement - more green and gold, less red. No adapters because of your first step, and better per-base sequence quality because of your second step. Error correction may or may not change the sequence of reads, but it will not change the base quality. So I would not expect anything from the third step that fastqc can detect in qualitative terms.

ADD COMMENT
2
Entering edit mode
2.1 years ago

Since you are already using BBDuk from the BBTools suite, you might also want to run BBNorm. Like Musket, it can also error-correct reads, but also filter reads based on k-mer content. Since you have a lot of duplication and over-represented sequences, you may want to discard those reads. (of course, only if it is not a quantitative experiment like RNA-seq). The default clumpify.sh step however should also give you a quite good result.

Apart from this, a typical preprocessing with BBTools may look like this:

#Sequence-based deduplication (optical is only possible if read headers are intact which is often not the case with SRA)
clumpify.sh in=reads.fq.gz out=clumped.fq.gz dedupe optical

#Remove low-quality regions
#This step requires standard Illumina read headers and will not work with renamed reads, such as most SRA data.
filterbytile.sh in=clumped.fq.gz out=filtered_by_tile.fq.gz

#Trim adapters
bbduk.sh in=filtered_by_tile.fq.gz out=trimmed.fq.gz ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=100 ref=bbmap/resources/adapters.fa ftm=5 ordered

#Remove synthetic artifacts and spike-ins.  Add "qtrim=r trimq=8" to also perform quality-trimming at this point, but not if quality recalibration will be done later.
bbduk.sh in=trimmed.fq.gz out=filtered.fq.gz k=27 ref=bbmap/resources/sequencing_artifacts.fa.gz,bbmap/resources/phix174_ill.ref.fa.gz ordered 
ADD COMMENT

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6