Question: fastqc result: whether trim off low quality reads or not
gravatar for dpc
5 weeks ago by
dpc140 wrote:

Hi friends and colleagues

I have run 78 wgs metagenomic sequences (submitted sequences from other studies in SRA) with fastqc followed their result accumulation by multiqc. I have got some result like below. I am confused whether to trim bases with phred qualty score 20. Because, in that case a lot of sequence will become much shorter. And in that case, it will negatively affect the maping with bowtie aligner. Is my understanding true? What should I do in this case?

Thanks and regards, DC7


fastqc quality_check multiqc • 108 views
ADD COMMENTlink modified 5 weeks ago by swbarnes28.2k • written 5 weeks ago by dpc140

Filter the low quality reads, i.e. if the average quality of the whole read is lower than a certain threshold, then discard it. I would only trim low quality of phred < 3, i.e. the everything the sequencer labels as garbage and then do a length filtering. The more bases you trim off, the more likely it becomes that a read is ambiguous, especially with repetitive or closely related reference sequences.

ADD REPLYlink written 5 weeks ago by cschu1812.4k

DC7 : You should clarify what is the intended downstream use for this data, in the original question. Based on your past threads it appears that you are doing MetaPhlAn analysis with these data?

As indicated by answers here, many aligners will soft clip parts of reads that don't align but if you are doing any assembly work then you would need to take care of trimming bad quality data yourself.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax87k

Thanks genomax for your response. Yes, I will be analysing the sequences with MetaPhlAn to profile the sequences followed by alpha-, beta- diversity analysis, etc. There is no assembly work as such. MetaPhlAn uses bowtie2 aligner which discards reads shorter than 70bp (default) and also discards reads that map with MAPQ < 30. Considering these information, do you think I should go for a separate clipping step?

ADD REPLYlink written 5 weeks ago by dpc140
gravatar for science_lizard
5 weeks ago by
science_lizard0 wrote:

I'm going to preface this by saying I'm not a bioinformatician, but for my own data analysis, I usually don't touch the FASTQ files unless there is serious adapter contamination or if I get very low alignment rates with Bowtie2. Mainly because A) I feel like it's a lot of extra work since Bowtie2 has specific parameters to deal with low quality scores and you can later filter out everything with MAPQ score < 20 or 30, and B) sometimes, as you said, too much trimming can negatively affect alignment. Maybe someone with more experience than me can confirm if that makes sense, but I did find this blog post to be pretty informative:

ADD COMMENTlink written 5 weeks ago by science_lizard0
gravatar for swbarnes2
5 weeks ago by
United States
swbarnes28.2k wrote:

It looks like bowtie2 will soft-clip to make an alignment happen.

A phred score of 10 still means that the base is likely to be correct. The odds of wrong bases causing a read to align to the wrong place are pretty low. And if a read aligns nowhere, even after soft-clipping by an aligner, then its' not doing any harm, and hard clipping it yourself wasn't going to fix it.

ADD COMMENTlink written 5 weeks ago by swbarnes28.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 686 users visited in the last hour