How to distinguish protein isoforms using biomaRt?
2
1
Entering edit mode
7.9 years ago
richierocks ▴ 10

I want to retrieve the canonical sequence for some proteins from ensembl, based upon their UniProt IDs.

library(biomaRt)

ensemblMart <- useMart(
biomart = "ensembl",
dataset = "hsapiens_gene_ensembl"
)

(result <- getBM(
attributes = c("uniprot_swissprot", "peptide"),
filters    = "uniprot_swissprot",
values     = "P31749",
mart       = ensemblMart,
))


This particular protein has two isoforms, both of which are returned. I only care about the canonical form, so how can I distinguish the isotopes?

I suspect that I need to return another of the attributes from the mart

listAttributes(ensemblMart)


but I don't know which one.

biomaRt R • 3.0k views
4
Entering edit mode
7.9 years ago
Emily 23k

There is no canonical defined in BioMart.

0
Entering edit mode

Thanks. Just to be clear: are there any other ways of distinguishing isoforms?

0
Entering edit mode

You could perhaps use the APPRIS principal isoform, which is available from BioMart.

1
Entering edit mode
7.9 years ago

If you are trying to retrieve protein sequences using UniProt IDs, why not just fetch them directly from UniProt using something like urllib2 on Python (I'm sure there's a way to do this in R as well)? Or even just wget on linux, i.e., wget -nv http://www.uniprot.org/uniprot/O53166.fasta. UniProt will return the canonical sequence.