Over- or under-represented motifs in coding regions
0
0
Entering edit mode
5.3 years ago
Gene_MMP8 ▴ 240

I have a set of di-, tri- and tetra-nucleotide motifs from the coding region of the genome that are over-represented. Is there any way to establish the biological significance of the over-representation? Right now it's just statistically significant with respect to a null model (of random sequences). Just like TRANSFAC that contains the significance of short motifs for regulatory regions, is there an equivalent database for coding regions as well?

R snp • 851 views
ADD COMMENT
2
Entering edit mode

What are you trying to show? It seems to me all you’ve found so far is the beginning of codon bias, which is already a well known phenomenon.

ADD REPLY
0
Entering edit mode

I have a list of mutations and the motifs are the bases flanking the mutation. I want to check whether over-representation of certain motifs influences the type of mutation.

ADD REPLY
1
Entering edit mode

You should really have added the information about the mutations to your post.

First thing you could consider doing is sequence querying the motifs to see if they map to insertion elements and/or inverted repeats. These are common mutation signatures, and you may first want to remove these from your dataset (or at least annotate them as such).

ADD REPLY

Login before adding your answer.

Traffic: 3155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6