we have identified SNPs in biological replicates (x3) for each population (in vcf files) from RNA-Seq data. My question is: how do we calculate the SNP frequency for each population from pooling the individual biological replicates?
I have found a VCF format in 1000 genomes project for describing all the population in one one VCF file with their respective pooled allele frequencies.
1 15211 rs78601809 T G 100 PASS AC=3050;AF=0.609026;AN=5008;NS=2504;DP=32245;EAS_AF=0.504;AMR_AF=0.6772;AFR_AF=0.5371;EUR_AF=0.7316;SAS_AF=0.6401;AA=t|||;VT=SNP
Our format is:
Sample 1 rep1
Sample 1 rep2
Sample 1 rep3
Sample 2 rep1
Sample 2 rep2
Sample 2 rep3
We have individual VCF file for each replicate from each sample. I just dont have any idea how to put all the reps information in single VCF file and get a pooled AF for each sample like in the format above.