NCBI Human Variation Sets VCF validation
1
0
Entering edit mode
7.8 years ago
Jirapong ▴ 20

The reference VCF file that available to download at http://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/ seems to have format issue.

This is happen when I try to run GATK BaseRecalibrator command. The error show that there is a duplicate allele added to VariantContext. I start to looks at the reference VCF file and found that there is duplicate Alternative Allele and also sometimes REF is showing in ALT column. I download vcftools to validate and get following waring

[tmp]$ vcf-validator homo_sapiens_GRCh37.vcf
1:2886090 .. REF allele listed in the ALT field??
1:4095845 .. Could not parse the allele(s) [AG], first base does not match the reference.
1:8121167 .. The alleles not unique: CAAT
1:8121167 .. The alleles not unique: CAAT
1:9127042 .. The alleles not unique: TAA
1:11408760 .. The alleles not unique: CTATGTATG
1:13177471 .. The alleles not unique: CTT
1:13894414 .. REF allele listed in the ALT field??
1:15015689 .. REF allele listed in the ALT field??

Is there other source where I can get a better Variation Sets file?

SNP NCBI GATK • 2.4k views
ADD COMMENT
3
Entering edit mode
7.8 years ago
iraun 5.7k

GATK provides variation sets files for human. I would recommend you to use them in order to prevent this kind of errors.

Hope it helps.

ADD COMMENT
0
Entering edit mode

@airan - Thank you very much. I able to get it from the second link ftp.broadinstitute.org/bundle/2.8/hg19/dbsnp_138.hg19.vcf.gz

ADD REPLY
0
Entering edit mode

Glad to help :)

ADD REPLY

Login before adding your answer.

Traffic: 2372 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6