Question: UTR sequence extraction
0
gravatar for ezraamustafa3
16 months ago by
ezraamustafa30 wrote:

How can I extract the 5' and 3' UTR sequences of a list of gene IDs using BiomaRt R package?

sequence R • 565 views
ADD COMMENTlink modified 16 months ago by h.mon31k • written 16 months ago by ezraamustafa30
0
gravatar for h.mon
16 months ago by
h.mon31k
Brazil
h.mon31k wrote:

Probably the biomaRt user guide covers your question. For example, it has an example showing how to Retrieve all 5’ UTR sequences of all genes that are located on chromosome 3 between the positions 185,514,033 and 185,535,839.

ADD COMMENTlink written 16 months ago by h.mon31k

Thanks a lot. I tried this one but it is limited to a specific chromosome and position. What am interested in is to have the UTRs sequences of a list of human genes that aren't loaded on the same chromosome to be able to count the GC content for each UTR alone. If you can help to figure out how to do this using R I'd appreciate.

ADD REPLYlink written 16 months ago by ezraamustafa30

(following h.mon) Try:

library("biomaRt")

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

ID <- "ENST00000429510.1"
getSequence(
  id = ID,
  seqType = "5utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)
getSequence(
  id = ID,
  seqType = "3utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)

Which returns:

> getSequence(
+   id = ID,
+   seqType = "5utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                     5utr ensembl_transcript_id_version
1 ATTCTTGTGAATGTGACACACGATCTCTCCAGTTTCCAT             ENST00000429510.1
> getSequence(
+   id = ID,
+   seqType = "3utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                                                                                                                                                                                                                                                                                                                                                                   3utr
1 GACGCAGAAGAAACATGTCCTTCATTCACCAGGCTGAGCTTTCACAGTGCAGTGGTTGGTACGGGACTAAATGTGAGGCTGATGCTCTACACAAGGAAAAACCTGACCTGCGCACAAACCATCAACTCCTCAGCTTTTGGGAACTTGAATGTGACCAAGAAAACCACCTTCATTGTCCATGGATTCAGGCCAACAGGCTCCCCTCCTGTTTGGATGGATGACTTAGTAAAGGGTTTGCTCTCTGTTGAAGACATGAACGTAGTTGTTGTTGATTGGAATCGAGGAGCTACAACTTTAATATATACCCATGCCTCTAGTAAGACCAGAAAAGTAGCCATGGTCTTGAAGGAATTTATTGACCAGATGTTGGCAG
  ensembl_transcript_id_version
1             ENST00000429510.1

Combination of 5.7 and 5.8 with "ensembl_transcript_id_version" as type. :-)

ADD REPLYlink modified 16 months ago • written 16 months ago by AK1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1425 users visited in the last hour