genomic features 3'utr and 5'utr
1
0
Entering edit mode
9 months ago
ginlucks ▴ 10

Hello, I am having some troubles using genomicFeatures R's package. Basically I am trying to retrieve for a specific gene the longest isoform and, for that isoform, exon, 3'utr and 5'utr coordinates, anyway results are strange: it seems to return 3'utr on the first exon and 5'utr on the last exon when the opposite was expected.

Here the code to replicate the results:

library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene

# specify the gene ID
gid = '387700'

# select the longest transcript
cur_tx = data.frame(transcriptsBy(txdb, by="gene")[[as.character(gid)]])
max_tx_id = cur_tx$tx_name[which.max(cur_tx$width)]

# obtain exons for longest gene isoform
cur_exon = data.frame(exonsBy(txdb, by="tx", use.names=TRUE)[max_tx_id])

# obtain the longest 5'UTR for the selected transcript
fiveUTR = data.frame(fiveUTRsByTranscript(txdb,use.names=TRUE)[max_tx_id])
fiveUTR = fiveUTR[which.max(fiveUTR$end),]

# obtain 3'UTR for for the selected transcript
threeUTR = data.frame(threeUTRsByTranscript(txdb,use.names=TRUE)[max_tx_id])
threeUTR = threeUTR[which.max(threeUTR$end),]

Here the results:

These are the exon coordinates:

  group group_name seqnames    start      end width strand exon_id exon_name  exon_rank
1     1 uc001kgm.3    chr10 91295199 91295313   115      -  142243      <NA>        1
2     1 uc001kgm.3    chr10 91294258 91294397   140      -  142242      <NA>        2
3     1 uc001kgm.3    chr10 91222136 91222381   246      -  142241      <NA>        3
4     1 uc001kgm.3    chr10 91203513 91203616   104      -  142240      <NA>        4
5     1 uc001kgm.3    chr10 91200865 91201008   144      -  142239      <NA>        5
6     1 uc001kgm.3    chr10 91198361 91198940   580      -  142238      <NA>        6
7     1 uc001kgm.3    chr10 91195817 91196076   260      -  142236      <NA>        7
8     1 uc001kgm.3    chr10 91190051 91193083  3033      -  142235      <NA>        8

where the 8th is the first exon ( which.min(cur_exon$start) ), and the 1st is the last exon.

Anyway the 3'UTR which is expected to be found at the end of the gene is at the very beginning and vice versa for the 5'UTR (you can check easily by exon_id)

3'UTR coordinates:

  group group_name seqnames    start      end width strand exon_id exon_name  exon_rank
1     1 uc001kgm.3    chr10 91190051 91192820  2770      -  142235      <NA>        8

5'UTR coordinates:

  group group_name seqnames    start      end width strand exon_id exon_name  exon_rank
1     1 uc001kgm.3    chr10 91295199 91295313   115      -  142243      <NA>        1

Anyone know why this happen ? Am I missing something

genome sequencing • 334 views
ADD COMMENT
4
Entering edit mode
9 months ago
husensofteng ▴ 310

could it be because the transcript is on the negative strand!

ADD COMMENT
0
Entering edit mode

so fast so helpful. Thx !

ADD REPLY
0
Entering edit mode

If this solved the problem please accept the answer (the green checkmark below the bookmark sign).

ADD REPLY

Login before adding your answer.

Traffic: 2221 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6