what is the difference between calling variants with and without interval list?
1
1
Entering edit mode
23 months ago
svp ▴ 500

I used following command

## With interval list

gatk HaplotypeCaller --input $sample.bqsr.bam --reference$reference --emit-ref-confidence GVCF --dbsnp $DBSNP --output$sample._genome.vcf.gz -L $interval  ### Step2 gatk GenotypeGVCFs -R$reference --variant $sample._genome.vcf.gz -O$sample.genotypeGvcf.vcf -L $interval  The output file generated was about 25 MB ## Without interval list ### Step1 gatk HaplotypeCaller --input$sample.bqsr.bam --reference $reference --emit-ref-confidence GVCF --dbsnp$DBSNP --output $sample._genome.vcf.gz  ### Step2 gatk GenotypeGVCFs -R$reference --variant $sample._genome.vcf.gz -O$sample.genotypeGvcf.vcf


outputfile generated is about 450 MB

Is this a problem? Number of variants called without interval is quite higher compared to with interval

GATK HaplotypeCaller intervallist • 1.3k views
5
Entering edit mode
23 months ago
Alewa ▴ 100

Specifying an interval list restricts your variant calling to only regions in the interval list. by not specifying an interval list, you are calling variants across the whole genome hence the big size of your output file. interval list are particularly useful when for whole exomes sequence data where roughly only 1% of the genome (primarily protein coding) is sequenced. There are subtle differences in the regions targeted by different kits (ie illumina, Agilent etc..) so in your analysis you may want to use the interval list of the targeted regions. more info here; https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

from a computational standpoint you could use interval list to 1. speed up your jobs or run in parallel; for example, split your interval list into 10 parts and run variant calling for each sample with each shard (so 10 parallel jobs for each sample) then combine the resulting vcfs. https://gatk.broadinstitute.org/hc/en-us/articles/360040509611-SplitIntervals