Question: Transcript and protein
11 months ago
Yaroslavl', Russia
Vladislav10 wrote:

Hi, Biostars community.

I have some set of mRNA transcripts ids e.g.: 'NM_007300.4', 'NM_007297.4', 'NM_007294.3' ... Well, I can to find which of them is canonical by using knownCanonical.txt and kgXref.txt from ucsc. 'NM_007300' in this case.

But I also have a set of their proteins ids, e.g.: 'NP_009231.2', 'NP_009228.2', 'NP_009225.1' ...

So, can you, please, tell me, how to find which of them depends to canonical mRNA transcript?


ID's you have above are basically cross-references to each other. Using EntrezDirect you can verify that:

$ esearch -db nuccore -query "NM_007297.4" | elink -target protein | efetch -format acc
$ esearch -db protein -query "NP_009228" | elink -target nuccore | efetch -format acc

If you want to convert Ensembl identifiers from knownCanonical.txt, you could use their REST API (random example from Canonical file) or BioMart.

