UTR sequence extraction
1
0
Entering edit mode
22 months ago

How can I extract the 5' and 3' UTR sequences of a list of gene IDs using BiomaRt R package?

sequence R • 756 views
ADD COMMENT
0
Entering edit mode
22 months ago
h.mon 32k

Probably the biomaRt user guide covers your question. For example, it has an example showing how to Retrieve all 5’ UTR sequences of all genes that are located on chromosome 3 between the positions 185,514,033 and 185,535,839.

ADD COMMENT
0
Entering edit mode

Thanks a lot. I tried this one but it is limited to a specific chromosome and position. What am interested in is to have the UTRs sequences of a list of human genes that aren't loaded on the same chromosome to be able to count the GC content for each UTR alone. If you can help to figure out how to do this using R I'd appreciate.

ADD REPLY
0
Entering edit mode

(following h.mon) Try:

library("biomaRt")

ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

ID <- "ENST00000429510.1"
getSequence(
  id = ID,
  seqType = "5utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)
getSequence(
  id = ID,
  seqType = "3utr",
  type = "ensembl_transcript_id_version",
  mart = ensembl
)

Which returns:

> getSequence(
+   id = ID,
+   seqType = "5utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                     5utr ensembl_transcript_id_version
1 ATTCTTGTGAATGTGACACACGATCTCTCCAGTTTCCAT             ENST00000429510.1
> getSequence(
+   id = ID,
+   seqType = "3utr",
+   type = "ensembl_transcript_id_version",
+   mart = ensembl
+ )
                                                                                                                                                                                                                                                                                                                                                                                   3utr
1 GACGCAGAAGAAACATGTCCTTCATTCACCAGGCTGAGCTTTCACAGTGCAGTGGTTGGTACGGGACTAAATGTGAGGCTGATGCTCTACACAAGGAAAAACCTGACCTGCGCACAAACCATCAACTCCTCAGCTTTTGGGAACTTGAATGTGACCAAGAAAACCACCTTCATTGTCCATGGATTCAGGCCAACAGGCTCCCCTCCTGTTTGGATGGATGACTTAGTAAAGGGTTTGCTCTCTGTTGAAGACATGAACGTAGTTGTTGTTGATTGGAATCGAGGAGCTACAACTTTAATATATACCCATGCCTCTAGTAAGACCAGAAAAGTAGCCATGGTCTTGAAGGAATTTATTGACCAGATGTTGGCAG
  ensembl_transcript_id_version
1             ENST00000429510.1

Combination of 5.7 and 5.8 with "ensembl_transcript_id_version" as type. :-)

ADD REPLY

Login before adding your answer.

Traffic: 2473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6