fastqc result: whether trim off low quality reads or not
2
0
Entering edit mode
3.8 years ago
dpc ▴ 240

Hi friends and colleagues

I have run 78 wgs metagenomic sequences (submitted sequences from other studies in SRA) with fastqc followed their result accumulation by multiqc. I have got some result like below. I am confused whether to trim bases with phred qualty score 20. Because, in that case a lot of sequence will become much shorter. And in that case, it will negatively affect the maping with bowtie aligner. Is my understanding true? What should I do in this case?

Thanks and regards, DC7

Screenshot-from-2020-07-10-19-43-47

fastqc multiqc quality_check • 1.8k views
ADD COMMENT
0
Entering edit mode

Filter the low quality reads, i.e. if the average quality of the whole read is lower than a certain threshold, then discard it. I would only trim low quality of phred < 3, i.e. the everything the sequencer labels as garbage and then do a length filtering. The more bases you trim off, the more likely it becomes that a read is ambiguous, especially with repetitive or closely related reference sequences.

ADD REPLY
0
Entering edit mode

DC7 : You should clarify what is the intended downstream use for this data, in the original question. Based on your past threads it appears that you are doing MetaPhlAn analysis with these data?

As indicated by answers here, many aligners will soft clip parts of reads that don't align but if you are doing any assembly work then you would need to take care of trimming bad quality data yourself.

ADD REPLY
0
Entering edit mode

Thanks genomax for your response. Yes, I will be analysing the sequences with MetaPhlAn to profile the sequences followed by alpha-, beta- diversity analysis, etc. There is no assembly work as such. MetaPhlAn uses bowtie2 aligner which discards reads shorter than 70bp (default) and also discards reads that map with MAPQ < 30. Considering these information, do you think I should go for a separate clipping step?

ADD REPLY
0
Entering edit mode
3.8 years ago

I'm going to preface this by saying I'm not a bioinformatician, but for my own data analysis, I usually don't touch the FASTQ files unless there is serious adapter contamination or if I get very low alignment rates with Bowtie2. Mainly because A) I feel like it's a lot of extra work since Bowtie2 has specific parameters to deal with low quality scores and you can later filter out everything with MAPQ score < 20 or 30, and B) sometimes, as you said, too much trimming can negatively affect alignment. Maybe someone with more experience than me can confirm if that makes sense, but I did find this blog post to be pretty informative:

http://biofinysics.blogspot.com/2014/05/how-does-bowtie2-assign-mapq-scores.html

ADD COMMENT
0
Entering edit mode
3.8 years ago

It looks like bowtie2 will soft-clip to make an alignment happen.

A phred score of 10 still means that the base is likely to be correct. The odds of wrong bases causing a read to align to the wrong place are pretty low. And if a read aligns nowhere, even after soft-clipping by an aligner, then its' not doing any harm, and hard clipping it yourself wasn't going to fix it.

ADD COMMENT

Login before adding your answer.

Traffic: 2931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6