I currently have a large list of Ensembl protein IDs (ENSP) that are from GRCh37. I need to map these IDs to the entry name listed on the UniProt website (e.g. 'CASPE_HUMAN' ). I am having trouble doing this using the UniProt dataset since it is up to date with the GRCh38 Ensembl IDs. Right now, I have a dataset that maps GRCh37 IDs to UniProtKB-AC (e.g. P31944)- some of these UniProt IDs are obsolete though. Is there a way I can see which Ensembl IDs have been updated in GRCh38 version? My overall goal is to find the updated UniProt IDs for the list of GRCh37_IDs I have.
I would love to have a dataframe that looks like (currently using Python):
GRCh37_ID GRCh38_ID Old UniProt New UniProt ENSP001 ENSP001 P1234 P1234 ENSP002 ENSP004 P4567 P5632 ENSP003 ENSP009 P1292 P1292 ENSP004 ENSP0012 P1434 P2434
After this, I could just grab the new Uniprot ID that corresponds to my old GRCh37_IDs to find the entry name. Is this possible? I've been struggling to figure this out.
Recap: I started with a list of Ensembl Translation/Protein stable IDs (ENSPs) for GRCh37 and I want to find their UniProtKB-SwissProtIDs. The issue I am having is that when I use BioMart, there are some UniProtKB-SwissProtIDs included that are no longer in the UniProt system (so I can't find an entry_name for it). I was thinking in order to combat this, I could find the corresponding ENSPs for GRCh38 and then find their UniProtKB-SwissProtIDs since it should be more up to date. The issue is, I don't know how to map the old ENSPs to the new ones.