Question: Finding Utrs Of Genes That Contain A Particular Sequence
5.2 years ago
robinrzhang wrote:

Hi Everyone,

I'm trying to determine whether the following motif (UAGGUUAAG) is located in either 5' or 3' UTRs of specific genes. Unfortunately, a lot of the bioinformatic tools that I have encountered either require a motif that is 25 bases or longer, or is generally insufficient in determining if such is the case. Does anyone know how I can get this information


ADD COMMENTlink modified 5.2 years ago by Devon Ryan88k • written 5.2 years ago by robinrzhang0
5.2 years ago
JC wrote:

An easy way can be to download all UTR regions with Biomart and the use a simple RegEx with grep/perl/python to identify and count your motif.

ADD COMMENTlink written 5.2 years ago by JC7.6k
5.2 years ago
Devon Ryan
Freiburg, Germany
Devon Ryan wrote:

Here's an example using the R interface to Biomart (just looking at 100 human genes and only the 5' UTR, to keep things simple):

ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
IDS <- getBM(attributes="ensembl_gene_id", mart=ensembl)[c(1:100),]
seqs5 <- getSequence(id=IDS, type="ensembl_gene_id", seqType="5utr", mart=ensembl)
hits <- seqs5[[2]][grep("TAGGTTAAG", seqs5[[1]])]

If you already have the genomic sequence sitting around and an annotation file for it, then you can just import the GTR into R, filter it so you only have UTR sequence, and then getSeq() on those followed by a similar grep(). This would likely be faster.

ADD COMMENTlink written 5.2 years ago by Devon Ryan88k
