VCF validation error:Ref allele mismatch?
1
0
Entering edit mode
5.5 years ago
bioinfo89 ▴ 50

Hello All,

I am working on submitting novel SNPs to dbSNP. When I use their online VCF validator tool I get the following error:

##ERR_REF_MISMATCH=Ref allele mismatch. Fix: need to match the reference genome on the FORWARD orientation

(Expect: T, Found: G)

I checked the strand information for the variant called and it shows "+". So if the variant called is on plus strand then why does the tool throw mismatch error?

Can anyone help me in understanding this concept? Should I just change the allele from a G>A to T>A ? which I am not sure is a good idea.

Thanks in advance!

SNP assembly genome • 3.5k views
ADD COMMENT
0
Entering edit mode

Hello,

those discrepancy is most likely due to different reference genomes. Which one did you use for alignment and variant calling? Which one is expected by dbSNP Validator?

fin swimmer

ADD REPLY
0
Entering edit mode

I have used hg19 which is the same as used by dbSNP validator.

ADD REPLY
0
Entering edit mode

Could you post the corresponding line of your vcf file?

ADD REPLY
0
Entering edit mode

here you go:

chrM    711 711 G   A
ADD REPLY
0
Entering edit mode

Hello again,

this is not a valid vcf line. It has more similarity with a bed file. So where does it come from?

fin swimmer

ADD REPLY
0
Entering edit mode
chrMT   711 .   G   A   .   .   VRT=1

this is the vcf format!

ADD REPLY
0
Entering edit mode

Yes, that's better :)

I'm not familiar with the dbSNP Submission validator. But in the docs it is stated out, that one have to provide the GeneBank Accession Number of the reference genome used. Double check if you realy used the correct one.

In hg19 there is a G on the position you show, but in hg38 there is T like the validator says. You can see what GeneBank Accession Number are available on this site in the History part.

fin swimmer

ADD REPLY
0
Entering edit mode

Yes, I did check the GeneBank Accession for hg19. I am using the correct one (GCF_000001405.25).

ADD REPLY
0
Entering edit mode

Could you post the full header of your vcf?

ADD REPLY
0
Entering edit mode

That the header I am using:

##fileformat=VCFv4.1
##fileDate=20180926
##dbSNP_meta_start
##TYPE:CONT
##HANDLE:xxxxxx
##NAME:xxxx
##FAX:xxxx
##TEL:xxx
##EMAIL:xxx
##LAB:xxx
##INST:xxx
##ADDR:xxx
##TYPE:PUB
##HANDLE:xxxx
##PMID:NA
##TITLE:xxxxxx
##AUTHORS:xxxx
##JOURNAL:NA
##VOLUME:NA
##PAGES:NA
##YEAR:2018
##STATUS:1
##TYPE:METHOD
##HANDLE:xxx
##ID:1xxx
##METHOD_CLASS:Sequence
##TEMPLATE_TYPE:DIPLOID
##METHOD:The whole exome sequencing data was used for the study.
##TYPE:POPULATION
##HANDLE:xxx
##ID:xxx
##POPULATION:This population includes 101 of participants.
##TYPE:SNPASSAY
##HANDLE:xxxx
##BATCH:SNP_Discovery
##SAMPLESIZE:24
##MOLTYPE:Genomic
##METHOD:WholeExome
##ORGANISM:Homo sapiens
##TYPE:SNPPOPUSE
##HANDLE:xxxx
##BATCH:SNP_Discovery
##METHOD:WholeExome
##dbSNP_meta_end
##handle=xxxxx
##batch=SNP_Discovery
##reference=GCF_000001405.25
##INFO=<ID=VRT,Number=1,Type=Integer,Description="Variation type,1 - SNV: single nucleotide variation,2 - DIV: deletion/insertion variation,3 - HETEROZYGOUS: variable, but undefined at nucleotide level,4 - STR: short tandem repeat (microsatellite) variation, 5 - NAMED: insertion/deletion variation of named repetitive element,6 - NO VARIATON: sequence scanned for variation, but none observed,7 - MIXED: cluster contains submissions from 2 or more allelic classes (not used),8 - MNV: multiple nucleotide variation with alleles of common length greater than 1,9 - Exception">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=FT,Number=.,Type=String,Description="Genotype-level filter">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PP,Number=G,Type=Integer,Description="Phred-scaled Posterior Genotype Probabilities">
##population_id=xxx
ADD REPLY
2
Entering edit mode
5.5 years ago

Hello,

I guess I found the problem, but I'm not sure how to solve it :(

I guess the reference genome you were using is derived from UCSC? I've found this statement here:

Note on chrM:

Since the release of the UCSC hg19 assembly, the Homo sapiens mitochondrion sequence (represented as "chrM" in the Genome Browser) has been replaced in GenBank with the record NC_012920. We have not replaced the original sequence, NC_001807, in the hg19 Genome Browser. We plan to use the Revised Cambridge Reference Sequence (rCRS, http://mitomap.org/bin/view.pl/MITOMAP/HumanMitoSeq) in the next human assembly release.

And indeed NC_012920 is associated with GCF_000001405.25 and not NC_001807 which has a G on position 711.

I think you should contact someone from dbSNP to find out how to handle this case. If you get a response, please post the result here.

fin swimmer

ADD COMMENT
0
Entering edit mode

Thanks @finswimmer for the explanation, that was very helpful. I did ask dbsnp people for the chrM and the version for hg19 I should use. I was told to use GCF_000001405.25 and address chrM as chrMT before got this error, since I skipped the chrM version part. I will update about this once I get an answer from them regarding this matter.

ADD REPLY
0
Entering edit mode

I received a reply from the dbSNP people, they suggested to update the MT variants according to the latest version NC_012920 and resubmit them.

ADD REPLY

Login before adding your answer.

Traffic: 2292 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6