Question: Single VCF file with multiple populations from 1000 genome project
gravatar for JstRoRR
3.4 years ago by
JstRoRR60 wrote:


we have identified SNPs in biological replicates (x3) for each population (in vcf files) from RNA-Seq data. My question is: how do we calculate the SNP frequency for each population from pooling the individual biological replicates?

I have found a VCF format in 1000 genomes project for describing all the population in one one VCF file with their respective pooled allele frequencies.

1 15211 rs78601809 T G 100 PASS AC=3050;AF=0.609026;AN=5008;NS=2504;DP=32245;EAS_AF=0.504;AMR_AF=0.6772;AFR_AF=0.5371;EUR_AF=0.7316;SAS_AF=0.6401;AA=t|||;VT=SNP

Our format is:
Sample 1  rep1
Sample 1  rep2
Sample 1  rep3
Sample 2  rep1
Sample 2  rep2
Sample 2  rep3

We have individual VCF file for each replicate from each sample. I just dont have any idea how to put all the reps information in single VCF file and get a pooled AF for each sample like in the format above.

Many thanks.

ADD COMMENTlink modified 3.4 years ago by Biostar ♦♦ 20 • written 3.4 years ago by JstRoRR60

You can use vcf-tools, vcf-merge, to create a single vcf file using all individual vcf files as inputs. How Can I Merge A Large Amount Of Vcf Files?

ADD REPLYlink written 3.4 years ago by

Hi Stephen, Thanks for your reply. Simple merging wont solve my problem. Will I get per sample (pooled from replicates) Allele frequencies as highlighted in above example? 

ADD REPLYlink written 3.4 years ago by JstRoRR60

you'll get new columns with the info for each sample in seperate columns,  placed at the end of each line for each site.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by

Thanks Stephen.

ADD REPLYlink written 3.4 years ago by JstRoRR60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2267 users visited in the last hour