Retrieve 5'UTR sequences for ensembl_transcript_id's with unique start/end positions
1
0
Entering edit mode
3.4 years ago
omit3333 • 0

Hello everyone,

I am working with biomaRt to access Ensembl annotation (see more info here: http://127.0.0.1:29459/library/biomaRt/doc/accessing_ensembl.html) and I am trying to retrieve 5'UTR sequences from "ensembl_transcript_id" (with filter) together with the "5_utr_start" and "5_utr_end" positions.

Example code (R studio):

query <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',"5_utr_start","5_utr_end","5utr"),filter = c("transcript_biotype","chromosome_name"), value = list(c("protein_coding"),c(1)), mart = ensembl)

For some "ensembl_transcript_id" entries this query gives me multiple "5_utr_start" and "5_utr_end" positions (separated by a semicolon). However, I get only a single 5'UTR sequence ("5utr") per "ensembl_transcript_id" for these entries. This means that I don't know which "5_utr_start" and "5_utr_end" positions are actually the correct ones for the displayed 5'UTR sequence ("5utr"). This is a problem for me because I need to know the exact starting & end position for the displayed UTR sequence for subsequent analysis.

Thank you for your help!

biomart ensembl R sequence • 1.6k views
ADD COMMENT
2
Entering edit mode
3.4 years ago
Ben_Ensembl ★ 2.4k

Hi omit3333,

This will be caused by the UTR regions spanning several exons. E.g: https://www.ensembl.org/Homo_sapiens/Transcript/Exons?db=core;g=ENSG00000000938;r=1:27612064-27635185;t=ENST00000374005

You'll want to take the most upstream and most downstream coordinates to get the overall start/end of the UTR.

ADD COMMENT
0
Entering edit mode

Hi Ben,

Thank you for your reply.

Do you know of any workaround to "mutate" a specific location within this UTR sequence?

Let's say I would like to mutate the nucleotide at genomic coordinates X to G within the 5'UTR sequence of a given transcript.

No I have the problem that I don't know the exact coordinates between start and end of the UTR (due to splicing).

Cheers, omit

ADD REPLY
0
Entering edit mode

Hi omit,

I'm afraid I can't think of any obvious options. You may have to write a custom script that uses the genomic coordinates of the exons over which the 5' UTR spans to calculate the genomic coordinate of the position Xbp from the start codon.

Cheers

Ben

ADD REPLY
0
Entering edit mode

Hi Ben,

I translated this problem into a "real world" problem by creating a so-called "bridge game".

See here: https://stackoverflow.com/questions/65003498/r-programming-row-wise-data-frame-calculation-with-custom-script-for-every-i

Let's see if the community can solve it.

Cheers omit

ADD REPLY

Login before adding your answer.

Traffic: 2863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6