Question: biomaRt: getBM & getSequence
0
gravatar for bsmith030465
3.9 years ago by
bsmith030465140
United States
bsmith030465140 wrote:

Hi,

I was trying to extract the exon sequence for ensembl transcript IDs (using GRCh37). I get somewhat perplexing results (getBM?):

 

======

myensembl = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

eid <- “ENST00000538028”

details <- getBM(attributes = c("chromosome_name","strand","5_utr_start","5_utr_end","genomic_coding_start","genomic_coding_end",
                                        "cdna_coding_start",
                                        "cdna_coding_end","cds_start","cds_end","3_utr_start","3_utr_end"),
                         filters = "ensembl_transcript_id",value = eid,mart = myensembl)

        
print(details)

seq = getSequence(id=eid, type="ensembl_transcript_id", seqType="gene_exon", mart = myensembl)
show(seq)

==============

- am I doing something wrong in either getBM and/or getSequence?

 

My session info is:

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.24.0       Biostrings_2.36.1    XVector_0.8.0        IRanges_2.2.1        S4Vectors_0.6.0      BiocGenerics_0.14.0  hash_2.2.6           stringr_1.0.0        foreign_0.8-63      
[10] BiocInstaller_1.18.2

loaded via a namespace (and not attached):
 [1] XML_3.98-1.1         bitops_1.0-6         GenomeInfoDb_1.4.0   DBI_0.3.1            magrittr_1.5         RSQLite_1.0.0        stringi_0.4-1        zlibbioc_1.14.0      tools_3.2.0         
[10] Biobase_2.28.0       RCurl_1.95-4.6       AnnotationDbi_1.30.1

ADD COMMENTlink modified 3.9 years ago by Devon Ryan89k • written 3.9 years ago by bsmith030465140
1
gravatar for Devon Ryan
3.9 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

There's nothing obviously wrong with anything you're doing. If you're curious why you're getting 7 sequences rather than 1, it's because gene_exon means "sequence of each exon within a gene". Perhaps you want cdna instead.

ADD COMMENTlink written 3.9 years ago by Devon Ryan89k

Actually, I was thinking that I would get at least 7 rows from the getBM function.

ADD REPLYlink written 3.9 years ago by bsmith030465140

Then you want the exon_chrom_start and exon_chrom_end attributes.

ADD REPLYlink written 3.9 years ago by Devon Ryan89k

Got it. Thanks!

ADD REPLYlink written 3.9 years ago by bsmith030465140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2141 users visited in the last hour