Question: Homopolymer sequencing error in IonProton reads
gravatar for Ashutosh Pandey
3.3 years ago by
Ashutosh Pandey11k wrote:

I am analyzing reads from a mouse genome sequenced using Ionproton machine. Around half of the reads aligned against the reference genome used either an insertion or a deletion (gap). This effect was clearly evident in VCF file (I used SAMtools mpileup and BCFtools) that contains lots of small indels (than expected) usually 1 bp indels which I am sure are due to the sequencing errors in the homopolymer regions. May be the approach used by the sequencer to quantify homonucleotides addition (based on peak of H+ ions release) doesn't have high resolution.  I looked at it online and found others complaining about the same issue. I am wondering if somebody has analyzed genomic data from Ionproton thoroughly for for the purpose of identifying sequence variants and can elaborate on the best approach to reduce these false positive indels. I have got aligned BAM files from the machine and they were aligned using TMAP. I am planning to use GATK Recalibrator that may model these errors and reduce their base qualities but not sure how much it would help. Please let me know if somebody already has some experience with this. 

homopolymer ionproton indels • 1.7k views
written 3.3 years ago by Ashutosh Pandey11k
