Question: UTR sequence extraction
0
gravatar for ezraamustafa3
12 months ago by
ezraamustafa30 wrote:

How can I extract the 5' and 3' UTR sequences of a list of gene IDs using BiomaRt R package?

sequence R • 420 views
ADD COMMENTlink modified 12 months ago by h.mon30k • written 12 months ago by ezraamustafa30
0
gravatar for h.mon
12 months ago by
h.mon30k
Brazil
h.mon30k wrote:

Probably the biomaRt user guide covers your question. For example, it has an example showing how to Retrieve all 5’ UTR sequences of all genes that are located on chromosome 3 between the positions 185,514,033 and 185,535,839.

ADD COMMENTlink written 12 months ago by h.mon30k

Thanks a lot. I tried this one but it is limited to a specific chromosome and position. What am interested in is to have the UTRs sequences of a list of human genes that aren't loaded on the same chromosome to be able to count the GC content for each UTR alone. If you can help to figure out how to do this using R I'd appreciate.

ADD REPLYlink written 12 months ago by ezraamustafa30

(following h.mon) Try:

library("biomaRt")

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

ID <- "ENST00000429510.1"
getSequence(
  id = ID,
  seqType = "5utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)
getSequence(
  id = ID,
  seqType = "3utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)

Which returns:

> getSequence(
+   id = ID,
+   seqType = "5utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                     5utr ensembl_transcript_id_version
1 ATTCTTGTGAATGTGACACACGATCTCTCCAGTTTCCAT             ENST00000429510.1
> getSequence(
+   id = ID,
+   seqType = "3utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                                                                                                                                                                                                                                                                                                                                                                   3utr
1 GACGCAGAAGAAACATGTCCTTCATTCACCAGGCTGAGCTTTCACAGTGCAGTGGTTGGTACGGGACTAAATGTGAGGCTGATGCTCTACACAAGGAAAAACCTGACCTGCGCACAAACCATCAACTCCTCAGCTTTTGGGAACTTGAATGTGACCAAGAAAACCACCTTCATTGTCCATGGATTCAGGCCAACAGGCTCCCCTCCTGTTTGGATGGATGACTTAGTAAAGGGTTTGCTCTCTGTTGAAGACATGAACGTAGTTGTTGTTGATTGGAATCGAGGAGCTACAACTTTAATATATACCCATGCCTCTAGTAAGACCAGAAAAGTAGCCATGGTCTTGAAGGAATTTATTGACCAGATGTTGGCAG
  ensembl_transcript_id_version
1             ENST00000429510.1

Combination of 5.7 and 5.8 with "ensembl_transcript_id_version" as type. :-)

ADD REPLYlink modified 12 months ago • written 12 months ago by SMK1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour