Question: problem mapping coding sequence ensembl (dna) to uniprot (prot)
gravatar for romain.lannes
4.5 years ago by
romain.lannes80 wrote:

So, I want to map uniprot proteine(main isoform) to ensembl(coding sequence) to estimate Ka/KS in very close related species.

I went to uniprot and downloaded protein sequences and ensembl id (enst) I converted enst to ensg (because, if I had understood) a ensg represent a physical location in genome and enst are variants. Until this point everything is ok. But then I try to get the corresponding coding  sequence . I went to ensembl and download sequence with ensg. For each ensg i look for the enst who codes  the corresponding protein. a large amount of ensg (~50% ) don't have transcript which exaclty match the proteine sequence.

I had more succes using exonerate on cds sequence (from ensembl) 6% of proteine/dna sequence have (mistmatch,indel, insert). This is clearly better, but:

is the exonerate way a good way to do this?

why this amount of non matching uniprot proteine ensemble coding sequence?



sequence R gene • 1.4k views
ADD COMMENTlink modified 8 weeks ago by Biostar ♦♦ 20 • written 4.5 years ago by romain.lannes80

I think you misunderstood EnsEMBL annotations, ENSTs are not variants. ENSG denotes a gene and ENST denotes a transcript. Both have genomic coordinates and protein-coding transcripts have translations, often associated with a UniProt acc if the protein is represented in UniProt. Now if you have UniProt acc for one species that you want to map to proteins in EnsEMBL for another species, you could download the protein sequences from the EnsEMBL ftp site and use them for mapping with your tool of choice. However, keep in mind that even closely related species will have differences at the protein level, the number of differences will depend on how close or distant the species are.

ADD REPLYlink written 4.5 years ago by Jean-Karim Heriche23k

Thanks you for your answers. It is helpfull. I had a problem to clearly understand what whas ENST.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by romain.lannes80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 974 users visited in the last hour