Question: Strategies for calling variants in a transposon-rich genome
0
gravatar for Dave Carlson
21 months ago by
Dave Carlson90
Stony Brook University, NY
Dave Carlson90 wrote:

Hi All,

I'm trying to call variants in a set of plant genome samples using a bwa mem followed by GATK's HaplotypeCaller. However, I'm running into a potential roadblock that I'm hoping to get advice on. These genomes are rich in transposable elements, and when I visualize the alignments with IGV, I'm finding that that are sizable stretches in the alignment (that includes a non-trivial proportion of the total reads) corresponding to TE sequences where essentially all the reads are multi-mapping and thus have a MAPQ score of zero.

My understanding is that GATK's HaplotypeCaller will not call variants in these regions due to the MAPQ scores. For various reasons, I would like to be able to call variants in these repeat sequences, but I'm not sure how to go about doing so without removing the MAPQ minimum threshold (which seems likely to be a bad idea!).

One of my colleagues suggested that I could split my BAM files to separate out the reads mapping to TE sequences in the reference (I have already annotated TEs in the reference and have a GFF with their locations) and then change the MAPQ scores for these reads from zero to some arbitrary value that would pass the GATK threshold (minimum value of 10).

That's an interesting idea, but I'm not sure if there are other, better options. Anybody have suggestions for how I can call variants in these repeat regions?

Thanks!

snp gatk genome • 483 views
ADD COMMENTlink modified 21 months ago • written 21 months ago by Dave Carlson90

How similar are the TE copies? You could mask the original TEs, and retain just one copy of each family, so their mapping will be unique.

ADD REPLYlink written 21 months ago by h.mon26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 972 users visited in the last hour