Presence Of Series Of Hashes In Illumina Quality Line
3
1
Entering edit mode
11.2 years ago

Hi all,

My paired-end illumina data contains series of hash(#) signs in the quality line. Some sequence have # quality for all the bases. When we convert it in to Sanger(Phred+33) it gives us quality value 2.

Do I need to trim these low quality bases for QC?

Thanks, Deepthi

illumina quality sanger • 2.2k views
ADD COMMENT
1
Entering edit mode
11.2 years ago
Rahul Sharma ▴ 660

This would depend on the read length and read coverage depth of your data. If you are working on 76bp paired end reads and with great coverage >180x, I would discard the reads along with its pair. If you are thinking to do assembly after trimming, these trimmed reads may create problems in k-mer selection. In case of HiSeq2000 reads I would do that, because your read length would be >100bp.

Best wishes, Rahul

ADD COMMENT
0
Entering edit mode
11.2 years ago
SES 8.6k

I would say yes, trim those positions. If the read is composed entirely of those qualities then I would toss it out. Most people don't just hard trim each position though, so take a look at Fastq Quality Control Shootout on different quality trimming methods.

ADD COMMENT
0
Entering edit mode
11.2 years ago
Irsan ★ 7.8k

It depends on what you want to do with your reads. When you want to do de novo assembly you need to be very strict with cleaning artifacts from your reads. When you do re-sequencing (say sequencing human tumor and detect mutations) then you can be less stringent. Still I would recommend to clean up your reads before starting analysis. Removing the artifacts will increase the amount of reads you can map so it will increase your coverage. Use FastQC to diagnose read quality issues and use for example trimmomatic to remove adaptar sequences and trim low quality bases.

ADD COMMENT

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6