start and end coordinates reversed in ensembl
1
0
Entering edit mode
7.6 years ago
jde715 • 0

Hello!

I am using Ensembl's public MySQL server (http://uswest.ensembl.org/info/data/mysql.html) to get coordinates & alleles for a set of rsids. I am using the variation_feature table in the homo_sapiens_variation_85_38 database. I have noticed for a subset of variants seq_region_end is less than seq_region_start. For example, for rs2066847, seq_region_start is 50729868 and seq_region_end is 50729867. It seems like the start and end coordinates have been reversed but the alleles returned are correct ("-/C") and not the reverse complement. See http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=2066847.

With two simple queries, I found that there are 5336568 entries where seq_region_end is less than seq_region_start and 4996794 entries where seq_region_end is greater than seq_region_start.

Can anyone explain what's going on here or point me to some documentation explaining this? Does this have to do with forward vs. reverse strand? From what I've seen so far, it seems like all insertions have start and end reversed - is that true? If so, why?

Thanks!

Jon

snp dbsnp ensembl variant • 1.5k views
ADD COMMENT
0
Entering edit mode
7.6 years ago

Short answer: dbSNP annotations can correspond to either strand. See Emily Ensembl's answer in this thread.

ADD COMMENT
0
Entering edit mode

I found this in the ensembl documentation: "Most of our SNPs and short insertion-deletions are from NCBI dbSNP. Variants in dbSNP can be on either the forward or reverse strand. Ensembl determines the forward-stranded allele and reports it."

so when ensembl imports variants from dbsnp, they correct the alleles to reflect the forward strand but do not correct the coordinates? am i missing something?

ADD REPLY

Login before adding your answer.

Traffic: 2015 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6