Gatk Unifiedgenotyper Slipping On Reference Sequence
0
0
Entering edit mode
12.5 years ago
Swbarnes2 ★ 1.6k

What happens is about half way through, the aligner seems to slip a postiion in the reference, so that almost every letter after that slip looks like a difference between reference and the .bam file, making the vcf huge. I've had this happen on two different microbial species.

I used bwa for alignment and to make the paired end bam, and samtools to sort and remove duplicates. I've also tried running that .bam through Picard's sorting and duplicate removal, but the result is the same. I also tried bams that had and had not gone through GATK indel realignment, but it didn't matter. What did work was telling GATK not to look at the whole genome, but to start about 100 kb from the beginning. When I did that, it did not slip. This was on a Staph genome of about 2.2 Mb. The reference fastas look normal.

Has anyone seen this before? Is there some subtle formatting issue with my bam that is causing this?

gatk snp • 2.3k views
ADD COMMENT
0
Entering edit mode

You are using the SAME reference for both GATK and your alignment step?

ADD REPLY
0
Entering edit mode

Yup, it's the right reference, both times. Two different species, same problem. I have no problems using samtools mpileup on the same .bams and references.

ADD REPLY
0
Entering edit mode

Here's the command line I'm using, maybe I'm missing something stupid that the program is silently tripping on:

java -jar ../../../../GATK/1.1.23/GenomeAnalysisTK.jar -T UnifiedGenotyper -I pdedup.bam -R ../../reference.fa -o pdedup.txt

ADD REPLY
0
Entering edit mode

I've got nothin'.

ADD REPLY

Login before adding your answer.

Traffic: 1867 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6