Of late we have been seeing a some wavy N patterns in illumina data. By this I mean a pattern of N's across the read length. To better understand we want to empirically estimate the error rates using the control data we have for some of the runs.
Two specific questions:
- Is there an existing method which takes the mapped bam/sam file and converts the MD flag into a graph of estimated error rates per read position? We just want to look at the percent mismatch bases per bp of the reads. Indels could be binned separately.
- Also It has been a while since I did PhiX mapping. Just wondering 90-95% mapped reads are on the expected lines. I have part of the memory that reminds me that the % may be close to 99. We are also looking are unmapeed reads to see what might be going on with them.