Question: Strategies for calling variants in a transposon-rich genome
gravatar for Dave Carlson
2.6 years ago by
Dave Carlson320
Stony Brook University, NY
Dave Carlson320 wrote:

Hi All,

I'm trying to call variants in a set of plant genome samples using a bwa mem followed by GATK's HaplotypeCaller. However, I'm running into a potential roadblock that I'm hoping to get advice on. These genomes are rich in transposable elements, and when I visualize the alignments with IGV, I'm finding that that are sizable stretches in the alignment (that includes a non-trivial proportion of the total reads) corresponding to TE sequences where essentially all the reads are multi-mapping and thus have a MAPQ score of zero.

My understanding is that GATK's HaplotypeCaller will not call variants in these regions due to the MAPQ scores. For various reasons, I would like to be able to call variants in these repeat sequences, but I'm not sure how to go about doing so without removing the MAPQ minimum threshold (which seems likely to be a bad idea!).

One of my colleagues suggested that I could split my BAM files to separate out the reads mapping to TE sequences in the reference (I have already annotated TEs in the reference and have a GFF with their locations) and then change the MAPQ scores for these reads from zero to some arbitrary value that would pass the GATK threshold (minimum value of 10).

That's an interesting idea, but I'm not sure if there are other, better options. Anybody have suggestions for how I can call variants in these repeat regions?


snp gatk genome • 588 views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Dave Carlson320

How similar are the TE copies? You could mask the original TEs, and retain just one copy of each family, so their mapping will be unique.

ADD REPLYlink written 2.6 years ago by h.mon29k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1591 users visited in the last hour