What could be the best explanation for a dip in phred quality score at the near beginning of reads?
0
0
Entering edit mode
4.5 years ago

I am looking at reverse read (R2) from a dataset from 2X150 paired end Illumina platform, transcriptome data. As observed from below plot (mean phred score distribution per base of read), a sudden "dip" could be seen at base number 5th, 6th and 7th. enter image description here I am wondering:

  1. What could be the best explanation for such dip? A problem with library preparation or a technical problem with sequencer? Another observation is that a major chunk of data sets is affected by this issue which is coming from the same sequencing batch.

  2. To get rid of this dip, I did a trimmomatic "HEADCROP" upto 7-8 bases which considerably improved the distribution for obvious reasons, however, this affected the "Sequence Duplication Levels" metric in the way that the "Percent of sequences remaining after deduplication" dropped from 71.7% to 32.8% as show here -

Before trimming enter image description here

and After trimming enter image description here What could be the explanation? I also, went through this biostar post with a little help.

phred fastqc illumina quality trimmomatic • 1.5k views
ADD COMMENT
1
Entering edit mode

Is this across all the lanes and tiles? If so it's a machine error (focusing issues or such). If not, it's probably a bubble (or series of them).

ADD REPLY
1
Entering edit mode

I suspect the sequence duplication level after cropping is closer to the truth, possibly it was masked before cropping by low quality / sequencing errors.

ADD REPLY

Login before adding your answer.

Traffic: 2845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6