How to distinguish protein isoforms using biomaRt?
2
1
Entering edit mode
7.1 years ago
richierocks ▴ 10

I want to retrieve the canonical sequence for some proteins from ensembl, based upon their UniProt IDs.

library(biomaRt)

ensemblMart <- useMart(
  biomart = "ensembl",
  dataset = "hsapiens_gene_ensembl"
)

(result <- getBM(
  attributes = c("uniprot_swissprot", "peptide"),
  filters    = "uniprot_swissprot",
  values     = "P31749",
  mart       = ensemblMart,
  bmHeader   = TRUE
))

This particular protein has two isoforms, both of which are returned.  I only care about the canonical form, so how can I distinguish the isotopes?

I suspect that I need to return another of the attributes from the mart

listAttributes(ensemblMart)

but I don't know which one.

 

 

 

R biomaRt • 2.8k views
ADD COMMENT
4
Entering edit mode
7.1 years ago
Emily 23k

There is no canonical defined in BioMart.

ADD COMMENT
0
Entering edit mode

Thanks.  Just to be clear: are there any other ways of distinguishing isoforms?

ADD REPLY
0
Entering edit mode

You could perhaps use the APPRIS principal isoform, which is available from BioMart.

ADD REPLY
1
Entering edit mode
7.1 years ago

If you are trying to retrieve protein sequences using UniProt IDs, why not just fetch them directly from UniProt using something like urllib2 on Python (I'm sure there's a way to do this in R as well)? Or even just wget on linux, ie. wget -nv http://www.uniprot.org/uniprot/O53166.fasta.  UniProt will return the canonical sequence.

ADD COMMENT

Login before adding your answer.

Traffic: 1878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6