SNPs at genomic coordinates do not match the reference
0
0
Entering edit mode
9.5 years ago
mattbawn ▴ 60

My pipeline for SNP calling calls an SNP at a coordinate that after uploading to SIFT is classed as a damaging mutation. When I go and look at this region in detail however in ensembl the coordinate does not correspond to the nucleotide (either ref or alt) for either the vcf file or the SIFT analysis. I have passed the variants through the GATK VQSL and it got though in the 99% tranche. I have ensured that I was using the same reference genome at each stage, what might be causing the coordinates to be out? (I realise this is a long open-ended question but I suppose I mean is this common and if so what is the most likely cause).

Any ideas please?

Thanks,
Matt

alignment SNP sequence • 2.4k views
ADD COMMENT
1
Entering edit mode

You got the check that again. It is very easy to use the wrong genomic build. That's is such an obvious and simple explanation that should be double-triple checked. Take some actual sequences out of the genome you have aligned against and match those to ensembl.

ADD REPLY
0
Entering edit mode

Thanks, I'll look more closely into this, I expect you're right.

ADD REPLY
1
Entering edit mode

Can you give more details please? Perhaps send us (some of) the coordinates you have got? How far out are they? You may want to try the VEP. It does contain SIFT too.

ADD REPLY
0
Entering edit mode

Thank you for the link. I'm looking into this now. One of the VEP transcripts is describing a missense variant which is promising but I need to look a little more into it. The SNP I am initially looking into is called as: 2:171258059 C,T which SIFT analysis yielded R662W.

ADD REPLY
0
Entering edit mode

It seems to me that one of the possible alleles (i.e. C) in the example you sent does indeed match the current human assembly GRCh38 (see the VEP track in Ensembl). VEP annotates that variant as intergenic. There is no gene anywhere near that position...

ADD REPLY
0
Entering edit mode

Thank you. Yes but in the GRCh37 build (which I used for the alignment) a missense variant is indicated in an interesting gene.

ADD REPLY
0
Entering edit mode

I see.The coordinate you sent was for GRCh37? Ok, we have got there the C allele as the reference one. One of the consequences of the variant seems to be indeed missense (for ENST00000484338, ENST00000317935, ENST00000408978, ENST00000409044 and ENST00000334231) and predicted to be damaging by both SIFT and PolyPhen.

ADD REPLY
1
Entering edit mode

Yes it's an encouraging result. and looks like it would be worth validating with Sanger sequencing.

ADD REPLY

Login before adding your answer.

Traffic: 2469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6