MiSeq: R1 vs. R2 reads show big quality difference - is that normal?
1
1
Entering edit mode
8.1 years ago
mschmid ▴ 180

Hi!

A company sequenced 20 different samples for our lab on MiSeq platform (2x300). Now I got the data and did a first rough evaluation of the quality. I have huge differences in quality of R1 reads vs. R2. Is that normal (see images)?

The two images show first the R1 reads an then the R2 reads. I averaged the prob. for a wrong basecall over all reads of one sample.

On the y-axis you can see the probabilty of a wrong base-call and on the x-axis the base pos. Be aware of the fact that the two y-axes do not have the same range (otherwise one would not see anything in R1 graph).

Important: At graph for R2 the x-axis is flipped (Did this already in a script to align them later...)

http://picpaste.com/pics/R1.1398695966.png

http://picpaste.com/pics/R2.1398696015.png

Thanks!

Michael

OK, here the phred scores. This time the x-axis of R2 is not flipped. What do you guys think? Is this a satisfying result considering that sequencing is done by professionals?

http://picpaste.com/pics/R1_phred.1398701258.png

http://picpaste.com/pics/R2_phred.1398700926.png

First, the distribution of phred scores (Image 1: R1, Image 2: R2). Here I have just counted the occurrence of phred scores per sample:

http://picpaste.com/pics/R1_Phreds_1.1398765452.png

http://picpaste.com/pics/R2_Phreds_1.1398765471.png

Second, I counted how many reads pass the following criteria:

150bp with at least a average phred score of 20 in a moving window of 3 bases (I guess that is not very strict criteria).

For R1 90% to 95% of the reads pass

For R2 just about 0.01% pass

MiSeq • 8.4k views
0
Entering edit mode

I usually do see differences between read 2 and read 1 of a pair in terms of quality, but usually not that bad.

0
Entering edit mode

It would be interesting to see these charts only for bases with a quality > 20 or 30. Presumably, you axis is flipped on the R2.

0
Entering edit mode

@ brentp: Yes sure! Stupid me, forgot to mention that... But still the R2 reads are much worse compared to R1 reads on the whole lenght... Guess that is still strange?

0
Entering edit mode

As Dan Gaston already said it is normal to see some difference, but I assume also that this is more than what one would expect. It is hard to say because I normally look at the Phred score directly without computing error probabilities. When I have seen a big difference between read 1 and read 2 quality this normally also means that the second index read is really low quality with a lot of uncalled bases. Did you have index reads and how do they look?

0
Entering edit mode

I second previous comments, it's normal to have a small variation, R2 is in general a little bad compared to R1, but your plots shows that something happen in R2 sequencing. Ask for a re-run or ignore R2 in your analysis.

1
Entering edit mode

It doesn't seem that odd to me that at the end of 300 bases in the 2nd pair in untrimmed reads, the error rate is up to 10%. But I usually don't look at the data like this so maybe I'm mistaken.

0
Entering edit mode

I agree. I usually look at it plotted out in the same way as FastQC does.

2
Entering edit mode
8.1 years ago

Hi mschmid,

Yes, it is normal for MISEQ to have R2 reads worse in quality than R1 reads. I play with MISEQ a lot and observe the same trend quite often. For reference, I recommend you to visit my page where I have compared different runs for average quality between R1 and R2 for MISEQ:

http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/oneliners.html?#SPATIAL

Also check Page 10 of these slides (my supervisor presented them at STAMPS 2013):

https://stamps.mbl.edu/images/9/9a/NoiseRemoval2013.pdf

Best Wishes,
Umer

2
Entering edit mode