Question: Estimating Empirical Error Rate In Illumina Sequencing Data
gravatar for Abhi
7.6 years ago by
United States
Abhi1.5k wrote:

Hey Guys

Of late we have been seeing a some wavy N patterns in illumina data. By this I mean a pattern of N's across the read length. To better understand we want to empirically estimate the error rates using the control data we have for some of the runs.

Two specific questions:

  1. Is there an existing method which takes the mapped bam/sam file and converts the MD flag into a graph of estimated error rates per read position. We just want to look at the percent mismatch bases per bp of the reads. Indels could be binned separately.

  2. Also It has been a while since I did PhiX mapping. Just wondering 90-95% mapped reads are on the expected lines. I have part of the memory that reminds me that the % may be close to 99. We are also looking are unmapeed reads to see what might be going on with them.

Thanks! -Abhi

ADD COMMENTlink modified 6.0 years ago by Biostar ♦♦ 20 • written 7.6 years ago by Abhi1.5k

How was your phix put there ? Spiked in with indices ?

ADD REPLYlink written 7.6 years ago by Gabriel R.2.8k

For the runs we are looking to check we had a full lane of no indices or spike in.

ADD REPLYlink written 7.6 years ago by Abhi1.5k
gravatar for JC
7.3 years ago by
JC11k wrote:

You can check if you have position specific errorsand other features with tools such as FastQC

ADD COMMENTlink written 7.3 years ago by JC11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1032 users visited in the last hour