build custom reference database from Complete Genomics dataset
0
0
Entering edit mode
5.6 years ago

Hi, I need guidance in creating a custom reference data base from Complete Genomics vcfBeta-GS0000*-ASM.vcf.bz2 files for the purpose as using as a reference panel for phasing and imputation.

I have closely followed the posting Custom Reference panel creation for data imputation from .vcf files up to the last step of merging, which is where I am encountering an error.

The command I'm entering prior to my error is:

bcftools merge -f PASS -Ov -m none -l temp.10.bcf.index.txt -o temp.10.merge.vcf

Specifically, the error that I am encountering after entering the above command is:

[W::bcf_hdr_check_sanity] GL should be declared as Number=G

The version of the vcf files from which the list of 10 bcf files (temp.10.bcf.index.txt) are derived from is 4.1 When I look at the GL lines that the error is referring to are as follows (ignore the backslash before the ID):

\##FORMAT=<\ID=GL,Number=.,Type=Integer,Description="Genotype Likelihood">

\##FORMAT=<\ID=CGA_CEGL,Number=.,Type=Integer,Description="Calibrated Genotype Likelihood, Equal Allele Fraction Assumption">

The VCFv4.1.pdf indicates that the GL (genotype likelihood) should be a number. Whereas, the Complete Genomics files that I have do not contain a number, rather, they contain a period ".".

I ran vcf-validator to see if that would yield any additional information, but it didn't, in fact, it just ran for about 10 minutes, after which, it simply returned to the prompt command.

Any help/ideas/comments/clarifications are welcome.

software error next-gen sequence • 886 views
ADD COMMENT

Login before adding your answer.

Traffic: 3009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6