Difference Between Biomart Query And The Ensembl Database
2
1
Entering edit mode
11.3 years ago

Hi,

I'm using the biomaRt R packages (bioconductor) to retrieve the 3'UTR sequences of a list of genes. I've the entrezgene Id for each of them and I've differences between the biomaRt result and the ensembl biomart DB.

Here's an example :

For the ENSBTAT00000014489 transcripts (I'm working with Bos Taurus sequences)

In R :

library("biomaRt")
ensembl <- useMart("ensembl")
ensembl <- useDataset("btaurus_gene_ensembl",mart=ensembl)
getSequence(seqType="3utr",mart=ensembl,type="entrezgene",id=522265)
3utr entrezgene
1 No UTR is annotated for this transcript     522265


In biomart ensembl : In "export Data", only check 3'UTR :

result :

So, where's the problem ? How can the biomaRt package not retrieve this sequence ?

Thanks a lot,

N.

biomart ensembl utr r • 4.9k views
2
Entering edit mode
11.3 years ago
Neilfws 49k

The link to Ensembl in your question does not display a 3'-UTR. At first glance, it seems to display the full coding sequence for the transcript - note that it begins with ATG and ends with TGA.

When I use web BioMart, I get the exact same result as when using R biomaRt (see screenshot below):

BioMart via the web should always give the same result as via R, since they connect to the same data source. If there are discrepancies, it's generally because the data you have is not what you thought it was.

0
Entering edit mode

ok thanks ! so biomaRt works great :)

0
Entering edit mode

FYI, this is the best browser page to check whether a transcript contains a UTR or not:

http://www.ensembl.org/Bos_taurus/Transcript/Exons?db=core;g=ENSBTAG00000010909;r=16:73764976-73768065;t=ENSBTAT00000014489

On this page the CDS is in black, UTRs in purple, introns in blue and flanking sequence in green. So, indeed this transcripts has no UTRs annotated.

1
Entering edit mode
11.3 years ago
Andeyatz ▴ 70

Hi,

I think there may be a difference between explicitly annotated UTR regions and a region which is 3' of a transcript. If you look at this page http://www.ensembl.org/Bos_taurus/Transcript/Sequence_cDNA?_format=HTML;db=core;flank3_display=0;flank5_display=0;g=ENSBTAG00000010909;genomic=unmasked;output=fasta;param=utr3;r=16:73764976-73768065;strand=feature;t=ENSBTAT00000014489 then you can see there is no annotated UTR.

The following query on the cow 64 database will show the same result

select t.seq_region_start, t.seq_region_end, e.seq_region_start as exon_start, e.seq_region_end as exon_end, et.rank
from transcript_stable_id
join transcript t using (transcript_id)
join exon_transcript et using (transcript_id)
join exon e using (exon_id)
where stable_id = 'ENSBTAT00000014489';


Hope this helps