Question: UTR sequence extraction
0
gravatar for ezraamustafa3
7 months ago by
ezraamustafa30 wrote:

How can I extract the 5' and 3' UTR sequences of a list of gene IDs using BiomaRt R package?

sequence R • 315 views
ADD COMMENTlink modified 7 months ago by h.mon29k • written 7 months ago by ezraamustafa30
0
gravatar for h.mon
7 months ago by
h.mon29k
Brazil
h.mon29k wrote:

Probably the biomaRt user guide covers your question. For example, it has an example showing how to Retrieve all 5’ UTR sequences of all genes that are located on chromosome 3 between the positions 185,514,033 and 185,535,839.

ADD COMMENTlink written 7 months ago by h.mon29k

Thanks a lot. I tried this one but it is limited to a specific chromosome and position. What am interested in is to have the UTRs sequences of a list of human genes that aren't loaded on the same chromosome to be able to count the GC content for each UTR alone. If you can help to figure out how to do this using R I'd appreciate.

ADD REPLYlink written 7 months ago by ezraamustafa30

(following h.mon) Try:

library("biomaRt")

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

ID <- "ENST00000429510.1"
getSequence(
  id = ID,
  seqType = "5utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)
getSequence(
  id = ID,
  seqType = "3utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)

Which returns:

> getSequence(
+   id = ID,
+   seqType = "5utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                     5utr ensembl_transcript_id_version
1 ATTCTTGTGAATGTGACACACGATCTCTCCAGTTTCCAT             ENST00000429510.1
> getSequence(
+   id = ID,
+   seqType = "3utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                                                                                                                                                                                                                                                                                                                                                                   3utr
1 GACGCAGAAGAAACATGTCCTTCATTCACCAGGCTGAGCTTTCACAGTGCAGTGGTTGGTACGGGACTAAATGTGAGGCTGATGCTCTACACAAGGAAAAACCTGACCTGCGCACAAACCATCAACTCCTCAGCTTTTGGGAACTTGAATGTGACCAAGAAAACCACCTTCATTGTCCATGGATTCAGGCCAACAGGCTCCCCTCCTGTTTGGATGGATGACTTAGTAAAGGGTTTGCTCTCTGTTGAAGACATGAACGTAGTTGTTGTTGATTGGAATCGAGGAGCTACAACTTTAATATATACCCATGCCTCTAGTAAGACCAGAAAAGTAGCCATGGTCTTGAAGGAATTTATTGACCAGATGTTGGCAG
  ensembl_transcript_id_version
1             ENST00000429510.1

Combination of 5.7 and 5.8 with "ensembl_transcript_id_version" as type. :-)

ADD REPLYlink modified 7 months ago • written 7 months ago by SMK1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1467 users visited in the last hour