Mapping Genomic Position To Protein Position
10.6 years ago
John Knowlg ▴ 10

I have a list of genomic SNP positions with exon number,exonic SNP position,exon sequence, Ensemble gene IDs and strand information, I am interested in mapping these SNPs to the position in protein in some other build assembly. Is there a simple way i could do so without using liftover option?

e.g.

Chr:SNP*position:[alleles]:strand:Ensemble*gene*ID:Exon*number:Exon*seq:Exonic*SNP_position
chr11:2111754:[G/T]:+:ENSSSCG00000009291:4:CTGCTGCGTGGGGTTCC:207

coordinates mapping snp assembly exon • 5.7k views
Hi John, welcome on biostar ! :-) your question has already been answered before : How To Check If The Iupac Snp Code Changes Translation ?

Thanks Pierre.. May be I didnot made myself much clear because i think its different from the link u point to. As the ultimate purpose of doing this task would be to run these SNPs on polyphen/sift. But the problem is that i am working with the assembly not available at the moment. So i want to map these coordinates to the known assembly and then run these on sift etc.

ok, I reopen the question.

10.6 years ago

If you work with ensembl data and you feel comfortable with it, you could always use ensembl's archive to find the information you need from a past assembly. you only need to add the ensembl_snp_id to the table you have, and then query for that ids on a previous ensembl version.

You will face problems if you don't find some ids on past ensembl's versions, so if you don't want to use liftover you could always use ensembl's assembly converter (accessible here), which would allow you to convert all your positions (must be genomic positions, not relative) to a previous assembly.

Thanks Jorge for the answer. Do you think it would be helpful even if the coordinates I have are in assembly (not available in ensemble yet) would allow me to convert to older assembly? Or maybe I should use something like exon number and position in exon to pull out the position in protein??

Sorry I don't think I understand the "assembly coordinates" meaning. Do you have a list of positions that refer to the beginning of an exon, or even to the beginning of a gene? Querying ensembl by position needs genomic positions, not relative positions. If your positions are genomic you may query the archive version that matches your assembly, but if your positions are relative I would either blast my sequences against the assembly of interest or either retrieve reference start positions (genic or exonic) and start referring my relative positions to absolute ones.