Question: Finding Utrs Of Genes That Contain A Particular Sequence
gravatar for robinrzhang
5.8 years ago by
robinrzhang0 wrote:

Hi Everyone,

I'm trying to determine whether the following motif (UAGGUUAAG) is located in either 5' or 3' UTRs of specific genes. Unfortunately, a lot of the bioinformatic tools that I have encountered either require a motif that is 25 bases or longer, or is generally insufficient in determining if such is the case. Does anyone know how I can get this information


ADD COMMENTlink modified 5.8 years ago by Devon Ryan92k • written 5.8 years ago by robinrzhang0
gravatar for JC
5.8 years ago by
JC8.8k wrote:

An easy way can be to download all UTR regions with Biomart and the use a simple RegEx with grep/perl/python to identify and count your motif.

ADD COMMENTlink written 5.8 years ago by JC8.8k
gravatar for Devon Ryan
5.8 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

Here's an example using the R interface to Biomart (just looking at 100 human genes and only the 5' UTR, to keep things simple):

ensembl <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
IDS <- getBM(attributes="ensembl_gene_id", mart=ensembl)[c(1:100),]
seqs5 <- getSequence(id=IDS, type="ensembl_gene_id", seqType="5utr", mart=ensembl)
hits <- seqs5[[2]][grep("TAGGTTAAG", seqs5[[1]])]

If you already have the genomic sequence sitting around and an annotation file for it, then you can just import the GTR into R, filter it so you only have UTR sequence, and then getSeq() on those followed by a similar grep(). This would likely be faster.

ADD COMMENTlink written 5.8 years ago by Devon Ryan92k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1800 users visited in the last hour