Question: Individual vcf for each sample and single vcf for all samples. Does the output contents differ?
0
gravatar for bioinforesearchquestions
3.3 years ago by
United States
bioinforesearchquestions260 wrote:

Dear All,

I have performed variant calling analysis for 24 samples using GATK pipeline and generated a single VCF with 24 samples in it. I need some clarifications on following things

1) If I generate single VCF file for each of the 24 samples individually and then generate a single VCF file containing all 24 samples,

- Are there any differences between them in the output VCF?

- if yes, what are the differences?

The reason why I am asking this is, I have family level information and also symptom level information for those 24 samples.

Family level information for those 24 samples

  • FamilyA : Sample1, Sample2, Sample3

  • FamilyB : Sample4, Sample5, Sample6

  • ….

  • FamilyH : Sample22, Sample23, Sample24

Symptom level information for those 24 samples

  • Joint pain : Sample1, Sample 4, Sample 14, Sample 15, Sample,16, Sample17

  • Bleeding : Sample2, Sample5, Sample6

  • Symptom X : …..

For instance,

  • I would like to know whether the samples that are grouped together in the above scenario have any common genetic variants among them. In other words, are there 'secondary' variants elsewhere in the exome (other than the X gene) that are common amongst patients that suffer from the same symptoms.

- I want to find common variants for the bleeding symptom, does the common variants differ between the case1 and case2 or not?

case1: I am comparing individual VCF file (sample2.vcf, sample5.vcf and sample6.vcf) and filtering the common variants

case2: I am extracting just the sample2, sample5, and sample6 from the single VCF file with all 25 samples in it

  • As the above example, I would like to find common variants at the family level as well.
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by bioinforesearchquestions260

The differences will be in INFO column (especially with AC, AN etc. tags). The combined VCF will have aggregated statistics for those tags. Other than that, I don't think there would be any differences.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by MAPK1.4k

Currently, I am generating the individual vcf files. Once it is complete, I will update you.

ADD REPLYlink written 3.3 years ago by bioinforesearchquestions260
0
gravatar for igor
3.3 years ago by
igor8.2k
United States
igor8.2k wrote:

There is a really nice presentation that covers this specific question very well: http://cbsu.tc.cornell.edu/lab/doc/Variant_workshop_Part2.pdf (relevant part starts at page 18)

ADD COMMENTlink written 3.3 years ago by igor8.2k

Thanks Igor for the material. It looks great.

ADD REPLYlink written 3.3 years ago by bioinforesearchquestions260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 658 users visited in the last hour