what is the difference between calling variants with and without interval list?
1
1
Entering edit mode
3.7 years ago
svp ▴ 680

I used following command

With interval list

Step1

gatk HaplotypeCaller --input $sample.bqsr.bam --reference $reference --emit-ref-confidence GVCF --dbsnp $DBSNP --output $sample._genome.vcf.gz -L $interval

Step2

gatk GenotypeGVCFs -R $reference --variant $sample._genome.vcf.gz -O $sample.genotypeGvcf.vcf -L $interval

The output file generated was about 25 MB

Without interval list

Step1

gatk HaplotypeCaller --input $sample.bqsr.bam --reference $reference --emit-ref-confidence GVCF --dbsnp $DBSNP --output $sample._genome.vcf.gz

Step2

gatk GenotypeGVCFs -R $reference --variant $sample._genome.vcf.gz -O $sample.genotypeGvcf.vcf

outputfile generated is about 450 MB

Is this a problem? Number of variants called without interval is quite higher compared to with interval

GATK HaplotypeCaller intervallist • 2.1k views
ADD COMMENT
5
Entering edit mode
3.7 years ago
Alewa ▴ 150

Specifying an interval list restricts your variant calling to only regions in the interval list. by not specifying an interval list, you are calling variants across the whole genome hence the big size of your output file. interval list are particularly useful when for whole exomes sequence data where roughly only 1% of the genome (primarily protein coding) is sequenced. There are subtle differences in the regions targeted by different kits (ie illumina, Agilent etc..) so in your analysis you may want to use the interval list of the targeted regions. more info here; https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

from a computational standpoint you could use interval list to 1. speed up your jobs or run in parallel; for example, split your interval list into 10 parts and run variant calling for each sample with each shard (so 10 parallel jobs for each sample) then combine the resulting vcfs. https://gatk.broadinstitute.org/hc/en-us/articles/360040509611-SplitIntervals

ADD COMMENT

Login before adding your answer.

Traffic: 1520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6