I want to retrieve the canonical sequence for some proteins from ensembl, based upon their UniProt IDs.
library(biomaRt) ensemblMart <- useMart( biomart = "ensembl", dataset = "hsapiens_gene_ensembl" ) (result <- getBM( attributes = c("uniprot_swissprot", "peptide"), filters = "uniprot_swissprot", values = "P31749", mart = ensemblMart, bmHeader = TRUE ))
This particular protein has two isoforms, both of which are returned. I only care about the canonical form, so how can I distinguish the isotopes?
I suspect that I need to return another of the attributes from the mart
listAttributes(ensemblMart)
but I don't know which one.
Thanks. Just to be clear: are there any other ways of distinguishing isoforms?
You could perhaps use the APPRIS principal isoform, which is available from BioMart.