Question: Calling Variations Using Gatk From Exome Data With Less Samples
6.9 years ago by
anchal10 wrote:

Hi All

I have done exome sequencing for 6 samples. Now I am calling variations using GATK. I am calling variations for each sample individually and using all the sites across the genome (dbSNP_135. vcf file). also running vqsr on the same data for single sample and all the training sets as mentioned by GATK for whole genome.

what I observe in the output is, Ti/Tv ratio obtained is very low (from what is expected form exome data, also the tranche plots show that there are lots of FPs.). I have following two questions:

1) Can anyone tell, should I call variations using all the samples or can do it for each sample individually?

2) Also, can I call variations for exome data using all the sites from reference (whole) genome or I need to give some list of regions which have been captured?

thanks in advance


ADD COMMENTlink modified 5.4 years ago by Biostar ♦♦ 20 • written 6.9 years ago by anchal10

Have you done any filtering of the resulting variants? You'll most likely need to do so.

ADD REPLYlink written 6.9 years ago by Sean Davis25k
6.9 years ago by
Rochester, NY USA
Alex Paciorkowski3.3k wrote:

Hi anchal.

1) you can put all your bam files into a bamlist and then run that whole file through GATK. This may help GATK identify the lousy reads common to all samples and not call them. This should knock down some of your false positives.

2) I don't give GATK my target capture file when calling variants, but do when calculating depth of coverage.

ADD COMMENTlink written 6.9 years ago by Alex Paciorkowski3.3k

Yup, when I use UnifiedGenotyper I put all my exomes for a project in to make one VCF file. It is quicker and the results seem to be better. Then you definitely need to do annotating and filtering. The false positive calls (Often low Depth, etc) will effect your Ti/Tv estimates.

ADD REPLYlink written 5.4 years ago by Dan Gaston7.1k
