Motif Seq In Enriched Genes
1
0
Entering edit mode
10.9 years ago
kanwarjag ★ 1.2k

It may appear to be very simple question. but I got lost. I ran and analyzed a Chipseq experiment. I identified binding motif for a particular protein of my interest. (which is around 7-8 letters). I scan the motif for presence with my list of enriched genes. I got a list of coordinates (100bp) theroetically which may have that binding motif. I want to see/ confirm at which position of enriched gene (it will be promoter) the motif seq is present. Is there any good way of doing in a set of genes say like 100 or so. Thanks

chipseq motif sequence • 2.2k views
ADD COMMENT
0
Entering edit mode
10.9 years ago

You may try with BEDOPS or BEDtools.

Put the coordinates of motif matches as a BED file:

chr1 1000 1008 motif1_match1
chr2 3000 3008 motif1_match2
chr3 4000 4008 motif2_match1
chr3 1200 1208 motif1_match3

Put the coordinates of your genes as a BED file, too:

chr1 1000 12008 gene1
chr2 1000 4021 gene2
chr3 8000 12008 gene3
chr3 1200 1208 gene4

If you want to include an upstream/downstream interval to these coordinates (to include the promoter and regulation regions), you can do it with gawk:

gawk "{print $1, $2-1000, $3+1000, $4}' mygenes.bed > mygenes_1000flank.bed

Then, you can use BEDOPS or BEDtools to get the intersection of these two files:

bedops --intersect motifs.bed mygenes_1000flank.bed
ADD COMMENT
0
Entering edit mode

One of the problem which I encounter is Chip-seq motif is 7-8 letters and when we intersect this way bed file we apply some sort of condition that 50% (0.05) or some thing that sort match. Will not be that if we have found motif and try to find out in the list of genes it should be close to 100% matching. Please correct me If I am off the track Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2315 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6