Finding Utrs Of Genes That Contain A Particular Sequence
2
0
Entering edit mode
10.3 years ago

Hi Everyone,

I'm trying to determine whether the following motif (UAGGUUAAG) is located in either 5' or 3' UTRs of specific genes. Unfortunately, a lot of the bioinformatic tools that I have encountered either require a motif that is 25 bases or longer, or is generally insufficient in determining if such is the case. Does anyone know how I can get this information

Thanks!

• 2.2k views
ADD COMMENT
2
Entering edit mode
10.3 years ago
JC 13k

An easy way can be to download all UTR regions with Biomart and the use a simple RegEx with grep/perl/python to identify and count your motif.

ADD COMMENT
0
Entering edit mode
10.3 years ago

Here's an example using the R interface to Biomart (just looking at 100 human genes and only the 5' UTR, to keep things simple):

library(biomaRt)
ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
IDS <- getBM(attributes="ensembl_gene_id", mart=ensembl)[c(1:100),]
seqs5 <- getSequence(id=IDS, type="ensembl_gene_id", seqType="5utr", mart=ensembl)
hits <- seqs5[[2]][grep("TAGGTTAAG", seqs5[[1]])]

If you already have the genomic sequence sitting around and an annotation file for it, then you can just import the GTR into R, filter it so you only have UTR sequence, and then getSeq() on those followed by a similar grep(). This would likely be faster.

ADD COMMENT

Login before adding your answer.

Traffic: 1462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6