CAF to population frequency
0
0
Entering edit mode
7.8 years ago
lyellooo • 0

Greetings. The story is that I want to use ContEst from the broad institute as one of my quality control tools for NGS data. However, for the ContEst, it is required a population frequency VCF file as an input file, which should contain the information in the following format, CEU={A*=0.13030, G=0.86970}. They provide hg18, and hg19 "right format" VCF files, but I need GRCh38.

###hg19.vcf for ContEst
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       566875  rs2185539       C       T       .       PASS    AC=66;AF=0.02369;ALL={C*=0.97629, T=0.02371};AN=2786;ASW={C*=1.00000, T=0.00000};CEU={C*=1.00000, T=0.00000};CHB={C*=1.00000, T=0.00000};CHD={C*=1.00000, T=0.00000};CHS={C*=0.00000, T=0.00000};CLM={C*=0.00000, T=0.00000};FIN={C*=0.00000, T=0.00000};GBR={C*=0.00000, T=0.00000};GIH={C*=1.00000, T=0.00000};IBS={C*=0.00000, T=0.00000};JPT={C*=1.00000, T=0.00000};LWK={C*=1.00000, T=0.00000};MKK={C*=0.82044, T=0.17956};MXL={C*=1.00000, T=0.00000};PUR={C*=0.00000, T=0.00000};TSI={C*=1.00000, T=0.00000};YRI={C*=0.99752, T=0.00248};set=MKK-YRI    GT


First I tried to liftover to GRCh38 with picard liftoverVCF, and because the info column is not the default format, I failed.

Then, I tried to liftover the hapmap3.3 b37, which I got from the gatk dataset, and I believed the broad liftover'd from b36 to b37 for hg19, to GRCh38 also with picard liftoverVCF, and for the same reason, I failed again.

###hapmap3.3 b37
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       55299   rs10399749      C       .       .       PASS    AN=510
1       55394   rs2949420       T       .       .       PASS    AN=178
1       55550   rs2949421       A       T       .       PASS    AC=173;AF=0.972;AN=178


And then, I tried to use the latest VCF released from NCBI, which is common_all_20150416.vcf, to build the population frequency file, but which comes with the question about how I can get population information from CAF.

###common_all_20150416.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       10177   rs367896724     A       AC      .       .       RS=367896724;RSPOS=10177;dbSNPBuildID=138;SSR=0;SAO=0;VP=0x050000020005140026000200;WGT=1;VC=DIV;R5;ASP;VLD;KGPhase3;CAF=0.5747,0.4253;COMMON=1


I am new to this area. Any piece of advice would be helpful. The question is how or where I can translate the information from "CAF=0.5747,0.4253" to specific population frequency.

The other choice for me is to liftover the original hapmap data mapped to hg38, which comes another following question:

There are only 11 allele frequency/genotype frequency in hapmap 2010 phase 2+3. However, in the ContEst input vcf file, there are 17 groups instead. Where can I find those missing 6 group information? and what kind of tool should I use for liftover hapmap data?

Regards

Edited:

Actually, picard liftover function is broken in all version. Here is the same question I asked on GATK forum: http://gatkforums.broadinstitute.org/discussion/5625/how-to-build-population-frequency-file-based-on-grch38-for-contest#latest

CAF MAF next-gen population-frequency • 3.0k views