Question

Individual vcf for each sample and single vcf for all samples. Does the output contents differ?

0

Entering edit mode

8.0 years ago

bioinforesearchquestions ▴ 370

Dear All,

I have performed variant calling analysis for 24 samples using GATK pipeline and generated a single VCF with 24 samples in it. I need some clarifications on following things

1) If I generate single VCF file for each of the 24 samples individually and then generate a single VCF file containing all 24 samples,

- Are there any differences between them in the output VCF?

- if yes, what are the differences?

The reason why I am asking this is, I have family level information and also symptom level information for those 24 samples.

Family level information for those 24 samples

FamilyA : Sample1, Sample2, Sample3
FamilyB : Sample4, Sample5, Sample6
….
FamilyH : Sample22, Sample23, Sample24

Symptom level information for those 24 samples

Joint pain : Sample1, Sample 4, Sample 14, Sample 15, Sample,16, Sample17
Bleeding : Sample2, Sample5, Sample6
Symptom X : …..

For instance,

I would like to know whether the samples that are grouped together in the above scenario have any common genetic variants among them. In other words, are there 'secondary' variants elsewhere in the exome (other than the X gene) that are common amongst patients that suffer from the same symptoms.

- I want to find common variants for the bleeding symptom, does the common variants differ between the case1 and case2 or not?

case1: I am comparing individual VCF file (sample2.vcf, sample5.vcf and sample6.vcf) and filtering the common variants

case2: I am extracting just the sample2, sample5, and sample6 from the single VCF file with all 25 samples in it

As the above example, I would like to find common variants at the family level as well.

VCF SNP variant calling DNASeq RNASeq • 2.2k views

ADD COMMENT • link 7.9 years ago by bioinforesearchquestions ▴ 370

0

Entering edit mode

The differences will be in INFO column (especially with AC, AN etc. tags). The combined VCF will have aggregated statistics for those tags. Other than that, I don't think there would be any differences.

ADD REPLY • link 8.0 years ago by MAPK ★ 2.1k

0

Entering edit mode

Currently, I am generating the individual vcf files. Once it is complete, I will update you.

ADD REPLY • link 7.9 years ago by bioinforesearchquestions ▴ 370

score 0 · Answer 1 · 2016-05-22

0

Entering edit mode

8.0 years ago

igor 13k

There is a really nice presentation that covers this specific question very well: http://cbsu.tc.cornell.edu/lab/doc/Variant_workshop_Part2.pdf (relevant part starts at page 18)