Question: CAF to population frequency
gravatar for lyellooo
3.8 years ago by
United States
lyellooo0 wrote:

Greetings. The story is that I want to use ContEst from the broad institute as one of my quality control tools for NGS data. However, for the ContEst, it is required a population frequency VCF file as an input file, which should contain the information in the following format, "CEU={A*=0.13030, G=0.86970}". They provide hg18, and hg19 "right format" VCF files, but I need GRCh38. 

###hg19.vcf for ContEst

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO 

1       566875  rs2185539       C       T       .       PASS    AC=66;AF=0.02369;ALL={C*=0.97629, T=0.02371};AN=2786;ASW={C*=1.00000, T=0.00000};CEU={C*=1.00000, T=0.00000};CHB={C*=1.00000, T=0.00000};CHD={C*=1.00000, T=0.00000};CHS={C*=0.00000, T=0.00000};CLM={C*=0.00000, T=0.00000};FIN={C*=0.00000, T=0.00000};GBR={C*=0.00000, T=0.00000};GIH={C*=1.00000, T=0.00000};IBS={C*=0.00000, T=0.00000};JPT={C*=1.00000, T=0.00000};LWK={C*=1.00000, T=0.00000};MKK={C*=0.82044, T=0.17956};MXL={C*=1.00000, T=0.00000};PUR={C*=0.00000, T=0.00000};TSI={C*=1.00000, T=0.00000};YRI={C*=0.99752, T=0.00248};set=MKK-YRI    GT


First I tried to liftover to GRCh38 with picard liftoverVCF, and because the info column is not the default format, I failed.

Then, I tried to liftover the hapmap3.3 b37, which I got from the gatk dataset, and I believed the broad liftover'd from b36 to b37 for hg19, to GRCh38 also with picard liftoverVCF, and for the same reason, I failed again.

###hapmap3.3 b37

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO

1       55299   rs10399749      C       .       .       PASS    AN=510

1       55394   rs2949420       T       .       .       PASS    AN=178

1       55550   rs2949421       A       T       .       PASS    AC=173;AF=0.972;AN=178


And then, I tried to use the latest VCF released from NCBI, which is common_all_20150416.vcf, to build the population frequency file, but which comes with the question about how I can get population information from CAF.


#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO

1       10177   rs367896724     A       AC      .       .       RS=367896724;RSPOS=10177;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020005140026000200;WGT=1;VC=DIV;R5;ASP;VLD;KGPhase3;CAF=0.5747,0.4253;COMMON=1


I am new to this area. Any piece of advice would be helpful. The question is how or where I can translate the information from "CAF=0.5747,0.4253" to specific population frequency.

The other choice for me is to liftover the original hapmap data mapped to hg38, which comes another following question:

There are only 11 allele frequency/genotype frequency in hapmap 2010 phase 2+3. However, in the ContEst input vcf file, there are 17 groups instead. Where can I find those missing 6 group information? and what kind of tool should I use for liftover hapmap data?



Actually, picard liftover function is broken in all version. Here is the same question I asked on GATK forum :



ADD COMMENTlink modified 3.7 years ago • written 3.8 years ago by lyellooo0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 702 users visited in the last hour