Question

Homopolymer sequencing error in IonProton reads

0

Entering edit mode

9.2 years ago

Ashutosh Pandey 12k

I am analyzing reads from a mouse genome sequenced using Ionproton machine. Around half of the reads aligned against the reference genome used either an insertion or a deletion (gap). This effect was clearly evident in VCF file (I used SAMtools mpileup and BCFtools) that contains lots of small indels (than expected) usually 1 bp indels which I am sure are due to the sequencing errors in the homopolymer regions. May be the approach used by the sequencer to quantify homonucleotides addition (based on peak of H+ ions release) doesn't have high resolution. I looked at it online and found others complaining about the same issue. I am wondering if somebody has analyzed genomic data from Ionproton thoroughly for for the purpose of identifying sequence variants and can elaborate on the best approach to reduce these false positive indels. I have got aligned BAM files from the machine and they were aligned using TMAP. I am planning to use GATK Recalibrator that may model these errors and reduce their base qualities but not sure how much it would help. Please let me know if somebody already has some experience with this.

Homopolymer Ionproton Indels • 3.2k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 9.2 years ago by Ashutosh Pandey 12k