Question: biomaRt: getBM & getSequence
0
gravatar for bsmith030465
4.2 years ago by
bsmith030465150
United States
bsmith030465150 wrote:

Hi,

I was trying to extract the exon sequence for ensembl transcript IDs (using GRCh37). I get somewhat perplexing results (getBM?):

 

======

myensembl = useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", path="/biomart/martservice", dataset="hsapiens_gene_ensembl")

eid <- “ENST00000538028”

details <- getBM(attributes = c("chromosome_name","strand","5_utr_start","5_utr_end","genomic_coding_start","genomic_coding_end",
                                        "cdna_coding_start",
                                        "cdna_coding_end","cds_start","cds_end","3_utr_start","3_utr_end"),
                         filters = "ensembl_transcript_id",value = eid,mart = myensembl)

        
print(details)

seq = getSequence(id=eid, type="ensembl_transcript_id", seqType="gene_exon", mart = myensembl)
show(seq)

==============

- am I doing something wrong in either getBM and/or getSequence?

 

My session info is:

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] biomaRt_2.24.0       Biostrings_2.36.1    XVector_0.8.0        IRanges_2.2.1        S4Vectors_0.6.0      BiocGenerics_0.14.0  hash_2.2.6           stringr_1.0.0        foreign_0.8-63      
[10] BiocInstaller_1.18.2

loaded via a namespace (and not attached):
 [1] XML_3.98-1.1         bitops_1.0-6         GenomeInfoDb_1.4.0   DBI_0.3.1            magrittr_1.5         RSQLite_1.0.0        stringi_0.4-1        zlibbioc_1.14.0      tools_3.2.0         
[10] Biobase_2.28.0       RCurl_1.95-4.6       AnnotationDbi_1.30.1

ADD COMMENTlink modified 4.2 years ago by Devon Ryan91k • written 4.2 years ago by bsmith030465150
1
gravatar for Devon Ryan
4.2 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

There's nothing obviously wrong with anything you're doing. If you're curious why you're getting 7 sequences rather than 1, it's because gene_exon means "sequence of each exon within a gene". Perhaps you want cdna instead.

ADD COMMENTlink written 4.2 years ago by Devon Ryan91k

Actually, I was thinking that I would get at least 7 rows from the getBM function.

ADD REPLYlink written 4.2 years ago by bsmith030465150

Then you want the exon_chrom_start and exon_chrom_end attributes.

ADD REPLYlink written 4.2 years ago by Devon Ryan91k

Got it. Thanks!

ADD REPLYlink written 4.2 years ago by bsmith030465150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1453 users visited in the last hour