Question

Estimating Empirical Error Rate In Illumina Sequencing Data

3

Entering edit mode

11.1 years ago

Abhi ★ 1.6k

Hey Guys

Of late we have been seeing a some wavy N patterns in illumina data. By this I mean a pattern of N's across the read length. To better understand we want to empirically estimate the error rates using the control data we have for some of the runs.

Two specific questions:

Is there an existing method which takes the mapped bam/sam file and converts the MD flag into a graph of estimated error rates per read position? We just want to look at the percent mismatch bases per bp of the reads. Indels could be binned separately.
Also It has been a while since I did PhiX mapping. Just wondering 90-95% mapped reads are on the expected lines. I have part of the memory that reminds me that the % may be close to 99. We are also looking are unmapeed reads to see what might be going on with them.

Thanks!
-Abhi

qualitycontrol illumina quality ngs • 4.3k views

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 11.1 years ago by Abhi ★ 1.6k

0

Entering edit mode

How was your phix put there ? Spiked in with indices ?

ADD REPLY • link 11.1 years ago by Gabriel R. ★ 2.9k

0

Entering edit mode

For the runs we are looking to check we had a full lane of PhiX..so no indices or spike in.

ADD REPLY • link 11.1 years ago by Abhi ★ 1.6k

Ram · Answer 1 · 2013-07-09

0

Entering edit mode

10.8 years ago

JC 13k

You can check if you have position specific errorsand other features with tools such as FastQC.

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 10.8 years ago by JC 13k