Question: TF motif binding region searching script in human genome
3
gravatar for Shicheng Guo
2.7 years ago by
Shicheng Guo8.0k
Shicheng Guo8.0k wrote:

Hi All,

I have one interest TF and its motif sequence is known (both logo and frequency matrix) and I want to identify all the related genomic regions. Is there any Perl, R or Python script to share? You can find that the perl script works perfect. therefore, I suggest you to use perl script.

blat don't works since sequence length is too short.

FYI:

motif logo (14bp) : TGGCACCATGCCAA

motif freqency matrix: `

A [ 0 0 0 0 14 2 2 7 4 0 0 0 16 14 ]

C [ 0 0 0 16 1 8 8 1 3 0 16 16 0 0 ]

G [ 0 16 16 0 0 5 4 5 2 16 0 0 0 1 ]

T [16 0 0 0 1 1 2 3 7 0 0 0 0 1 ]`

Logo:

enter image description here

logo motif • 850 views
ADD COMMENTlink modified 2.5 years ago • written 2.7 years ago by Shicheng Guo8.0k
2
seqkit locate -i -d -p TGGCACCATGCCAA <sequence.fa>/ <sequence.fa.gz>

i = ignore case d= degenerate base p = pattern

If you want to search only positive strand, use P. From 5th position, motif has degenerate bases.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by cpad011212k
2
gravatar for EagleEye
2.7 years ago by
EagleEye6.6k
Sweden
EagleEye6.6k wrote:

Perl solution:

http://homer.ucsd.edu/homer/motif/index.html

Web/application solution:

http://meme-suite.org/tools/meme

ADD COMMENTlink written 2.7 years ago by EagleEye6.6k
1

Yes. It is exactly what I need. Thanks.

ADD REPLYlink written 2.7 years ago by Shicheng Guo8.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1844 users visited in the last hour