Question: Strategies for calling variants in a transposon-rich genome
0
gravatar for Dave Carlson
12 months ago by
Dave Carlson40
Stony Brook University, NY
Dave Carlson40 wrote:

Hi All,

I'm trying to call variants in a set of plant genome samples using a bwa mem followed by GATK's HaplotypeCaller. However, I'm running into a potential roadblock that I'm hoping to get advice on. These genomes are rich in transposable elements, and when I visualize the alignments with IGV, I'm finding that that are sizable stretches in the alignment (that includes a non-trivial proportion of the total reads) corresponding to TE sequences where essentially all the reads are multi-mapping and thus have a MAPQ score of zero.

My understanding is that GATK's HaplotypeCaller will not call variants in these regions due to the MAPQ scores. For various reasons, I would like to be able to call variants in these repeat sequences, but I'm not sure how to go about doing so without removing the MAPQ minimum threshold (which seems likely to be a bad idea!).

One of my colleagues suggested that I could split my BAM files to separate out the reads mapping to TE sequences in the reference (I have already annotated TEs in the reference and have a GFF with their locations) and then change the MAPQ scores for these reads from zero to some arbitrary value that would pass the GATK threshold (minimum value of 10).

That's an interesting idea, but I'm not sure if there are other, better options. Anybody have suggestions for how I can call variants in these repeat regions?

Thanks!

snp gatk genome • 347 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by Dave Carlson40

How similar are the TE copies? You could mask the original TEs, and retain just one copy of each family, so their mapping will be unique.

ADD REPLYlink written 12 months ago by h.mon19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1376 users visited in the last hour