I want to retrieve the canonical sequence for some proteins from ensembl, based upon their UniProt IDs.
library(biomaRt)
ensemblMart <- useMart(
biomart = "ensembl",
dataset = "hsapiens_gene_ensembl"
)
(result <- getBM(
attributes = c("uniprot_swissprot", "peptide"),
filters = "uniprot_swissprot",
values = "P31749",
mart = ensemblMart,
bmHeader = TRUE
))
This particular protein has two isoforms, both of which are returned. I only care about the canonical form, so how can I distinguish the isotopes?
I suspect that I need to return another of the attributes from the mart
listAttributes(ensemblMart)
but I don't know which one.
Thanks. Just to be clear: are there any other ways of distinguishing isoforms?
You could perhaps use the APPRIS principal isoform, which is available from BioMart.