GATK HaplotypeCaller gvcfs smaller for male samples
0
0
Entering edit mode
6.0 years ago
tracyc • 0

Hi all,

I am using GATK's HaplotypeCaller to create GVCFs for some animal samples but am finding that file sizes for male autosomes tend to be smaller than for female autosomes. The input fastq sizes are comparable for male/female samples, as is the number of reads successfully aligned (I used BWA-mem, checked number of reads mapped using SAMtools idxstats) per autosome between male/female samples.

        java -Xmx4g -jar /usr/local/gatk/3.7.0/GenomeAnalysisTK.jar \
        -T HaplotypeCaller \
        -R $ref \
        --dbsnp $dbsnp \
        -I ${in}/${sample}.final.bam \
        --emitRefConfidence GVCF \
        -L $sequence \
        -o ${out}/${sample}.${sequence}.g.vcf

The sizes between chrM and chrUn are comparable which makes this even stranger.

Any ideas? Thanks.

SNP • 1.3k views
ADD COMMENT
0
Entering edit mode

How many samples do you have supporting this observation? Is there a difference in linecount (wc -l) for the gvcfs?

ADD REPLY
0
Entering edit mode

Thanks for your reply, it occurs for 4 males and 4 females. The raw fastqs are similar in size, as well as the bam files. There is a difference in the line count for the gvcfs, e.g. male1.chr1.g.vcf has 16317909 lines, female1.chr1.g.vcf has 32117004 lines. I had a look into each and compared the files. It looks like GATK thinks there are less "active sites" in the male, and hence there are less lines (but each position is still being account for as far as I can see). I am not sure why this would happen. The samples are ovine. I have run the same script on different species and this has not happened before.

ADD REPLY

Login before adding your answer.

Traffic: 2188 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6