Question: what is the difference between calling variants with and without interval list?
0
gravatar for svp
5 weeks ago by
svp240
Bangalore
svp240 wrote:

I used following command

With interval list

Step1

gatk HaplotypeCaller --input $sample.bqsr.bam --reference $reference --emit-ref-confidence GVCF --dbsnp $DBSNP --output $sample._genome.vcf.gz -L $interval

Step2

gatk GenotypeGVCFs -R $reference --variant $sample._genome.vcf.gz -O $sample.genotypeGvcf.vcf -L $interval

The output file generated was about 25 MB

Without interval list

Step1

gatk HaplotypeCaller --input $sample.bqsr.bam --reference $reference --emit-ref-confidence GVCF --dbsnp $DBSNP --output $sample._genome.vcf.gz

Step2

gatk GenotypeGVCFs -R $reference --variant $sample._genome.vcf.gz -O $sample.genotypeGvcf.vcf

outputfile generated is about 450 MB

Is this a problem? Number of variants called without interval is quite higher compared to with interval

ADD COMMENTlink modified 5 weeks ago by ekwame60 • written 5 weeks ago by svp240
2
gravatar for ekwame
5 weeks ago by
ekwame60
NYC
ekwame60 wrote:

Specifying an interval list restricts your variant calling to only regions in the interval list. by not specifying an interval list, you are calling variants across the whole genome hence the big size of your output file. interval list are particularly useful when for whole exomes sequence data where roughly only 1% of the genome (primarily protein coding) is sequenced. There are subtle differences in the regions targeted by different kits (ie illumina, Agilent etc..) so in your analysis you may want to use the interval list of the targeted regions. more info here; https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists

from a computational standpoint you could use interval list to 1. speed up your jobs or run in parallel; for example, split your interval list into 10 parts and run variant calling for each sample with each shard (so 10 parallel jobs for each sample) then combine the resulting vcfs. https://gatk.broadinstitute.org/hc/en-us/articles/360040509611-SplitIntervals

ADD COMMENTlink written 5 weeks ago by ekwame60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 694 users visited in the last hour