Question: UTR sequence extraction
0
gravatar for ezraamustafa3
9 days ago by
ezraamustafa30 wrote:

How can I extract the 5' and 3' UTR sequences of a list of gene IDs using BiomaRt R package?

sequence R • 140 views
ADD COMMENTlink modified 9 days ago by h.mon26k • written 9 days ago by ezraamustafa30
0
gravatar for h.mon
9 days ago by
h.mon26k
Brazil
h.mon26k wrote:

Probably the biomaRt user guide covers your question. For example, it has an example showing how to Retrieve all 5’ UTR sequences of all genes that are located on chromosome 3 between the positions 185,514,033 and 185,535,839.

ADD COMMENTlink written 9 days ago by h.mon26k

Thanks a lot. I tried this one but it is limited to a specific chromosome and position. What am interested in is to have the UTRs sequences of a list of human genes that aren't loaded on the same chromosome to be able to count the GC content for each UTR alone. If you can help to figure out how to do this using R I'd appreciate.

ADD REPLYlink written 9 days ago by ezraamustafa30

(following h.mon) Try:

library("biomaRt")

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

ID <- "ENST00000429510.1"
getSequence(
  id = ID,
  seqType = "5utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)
getSequence(
  id = ID,
  seqType = "3utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)

Which returns:

> getSequence(
+   id = ID,
+   seqType = "5utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                     5utr ensembl_transcript_id_version
1 ATTCTTGTGAATGTGACACACGATCTCTCCAGTTTCCAT             ENST00000429510.1
> getSequence(
+   id = ID,
+   seqType = "3utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                                                                                                                                                                                                                                                                                                                                                                   3utr
1 GACGCAGAAGAAACATGTCCTTCATTCACCAGGCTGAGCTTTCACAGTGCAGTGGTTGGTACGGGACTAAATGTGAGGCTGATGCTCTACACAAGGAAAAACCTGACCTGCGCACAAACCATCAACTCCTCAGCTTTTGGGAACTTGAATGTGACCAAGAAAACCACCTTCATTGTCCATGGATTCAGGCCAACAGGCTCCCCTCCTGTTTGGATGGATGACTTAGTAAAGGGTTTGCTCTCTGTTGAAGACATGAACGTAGTTGTTGTTGATTGGAATCGAGGAGCTACAACTTTAATATATACCCATGCCTCTAGTAAGACCAGAAAAGTAGCCATGGTCTTGAAGGAATTTATTGACCAGATGTTGGCAG
  ensembl_transcript_id_version
1             ENST00000429510.1

Combination of 5.7 and 5.8 with "ensembl_transcript_id_version" as type. :-)

ADD REPLYlink modified 9 days ago • written 9 days ago by SMK1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1954 users visited in the last hour