biomaRt: Extracting data for a particular isoform (R/bioconductor)
7.9 years ago
bsmith030465 ▴ 210

Hi,

I was trying to extract the 5' UTR and 3'UTR coordinates for a particular isoform. My code is:

library(biomaRt)

mart = useMart("ensembl")
getBM(attributes = c("5_utr_start","5_utr_end","3_utr_start","3_utr_end"),filters = "hgnc_symbol",value = "AAGAB",mart = ensembl)


Is there a way that I can specify that for the gene (AAGAB), I want information for isoform 001 (ie. AAGAB-001)? Else, how do I process the data given by my query above so that I can get this information?

Or should I be doing something else altogether?

thanks!

bioconductor utr biomart
I suspect that specifying a query based on gene symbol is not the way to go. You'll probably need to specify a transcript in the filter. Do you know a transcript ID?

Thanks for the reply. So, if I want to get the details of "ENST00000538028", I should use something like:

ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
xx <- getBM(attributes = c("5_utr_start","5_utr_end","3_utr_start","3_utr_end"),
filters = "ensembl_transcript_id",
value = "ENST00000261880",
mart = ensembl)


Am I using the right filters?

Looks right. You might want to specify ensembl_transcript_id in the attributes as well.

Indeed, a working example here: A: R org.Hs.eg.db matching ensembl gene ids with gene symbol

Just replace filter="ensembl_gene_id" with filter="ensembl_transcript_id"