Question: Is there any differnce between using biomaRt package under R to retrieve gene information from ensembl website and downloading this information using biomart tab in this website?
0
gravatar for M K
4.7 years ago by
M K490
United States
M K490 wrote:

Hi everyone,

I am trying to retrieve gene information from ensembl website to compare the  the gene information for mouse(mm10) with repetitive DNA is specific genome regions (UTR'S and intron, and upstream). I did two ways to get these files the first one using the R code below, and the second one by going directly to ensembl website using biomart tab to get these files.

I have 2 issues, the first one that there is a difference in total observations(rows) in both ways (i mean the total rows in both files are different).

 

The second issue, when I start find the genes that sharing the same position with these specific regions for repetitive DNA I got empty file results, and I don't know what causes that. BTW, I downloaded the repetitive DNA files from UCSC website using ensemble genes in track tab.

R code to retrieve the gene info.

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library(biomaRt)
 ### Retrieving mouse (mm10/GRCm38) from Ensembl website ###

mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
 

mm10_Gene=getBM(attributes=c("ensembl_gene_id","chromosome_name",'strand','transcript_start','transcript_end', "mgi_symbol"),mart=mouse)

assembly sequencing R gene • 1.8k views
ADD COMMENTlink modified 4.7 years ago by Ying W4.0k • written 4.7 years ago by M K490
1
gravatar for Ying W
4.7 years ago by
Ying W4.0k
South San Francisco, CA
Ying W4.0k wrote:

As long as you are on the same release, the results should be the same (not sure how to tell which release the bioconductor package is using but it might be a couple releases behind the website).

Could you give an example of a gene in repetitive DNA that you can find in website but not through biomaRt?

ADD COMMENTlink written 4.7 years ago by Ying W4.0k

Hi Ying,

I used mouse(mm10) release, which is the latest release. Then I used table browser in UCSC to download the repetitive DNA and in the track tab I used ensembl genes then I got for example Introns plus region from the get output tab. since UCSC doesn't provide the gene info for ensemble genes specially mgi-symbols I retrieve the gene info from ensembl website directly or by using the r code above.

ADD REPLYlink modified 3 months ago by RamRS25k • written 4.7 years ago by M K490

not the mouse reference, but the annotation release, if you look on the ensembl website it is currently on release 80. UCSC is probably using a different release also, annotations are updated more often than reference is.

ADD REPLYlink written 4.7 years ago by Ying W4.0k

So is there any way to download the repetitive DNA from Ensembl website directly like the one on UCSC? For example I want to download the introinc, CDS, 10K upstream and 10k downstram for the mouse (mm10) and human(hg19). and I think by doing that the annotation data and repetitive DNA will be consist for this analysis since they are from the same source which is ensembl.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by M K490

have a look here: How To Get All Ensembl Repeatfeatures From Biomart Or The Ensembl Rest Api?

ADD REPLYlink written 4.7 years ago by Ying W4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1576 users visited in the last hour