Question: Program For Motif Search
3
gravatar for Dhivya
10.2 years ago by
Dhivya30
Dhivya30 wrote:

Hi,
I want to find a particular motif(GTGGTGGGCC) in Arabidopsis thaliana whole genome.Is there any way to write a program in perl/python.

programming • 5.5k views
ADD COMMENTlink written 10.2 years ago by Dhivya30
1

The reply is "of course" :)

ADD REPLYlink written 10.2 years ago by Eric Normandeau10k
7
gravatar for Neilfws
10.2 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

Since you ask "Is there a way to write a program..." - indeed there is: (1) download files with chromosome sequence, (2) learn about regular expressions in the language of your choice, (3) away you go!

Either Bioperl or Biopython will have methods to do this, but here's a quick Perl guide which assumes that $chrom is a string with the chromosome sequence:

$motif = "GTGGTGGGCC";
while($chrom =~/$motif/g) {
print "Found a match from ".($-[0]+1)." to ".($+[0])."\n";
                  }

This uses the special Perl variables @- and @+, indices containing the start and end of the match, respectively. You add one to $-[0] since indices start from zero, whereas sequence numbering starts from one. Also, you'd want to alter the print line to give you delimited output: chromosome, motif, start, end, strand would be appropriate.

And then you'll want to consider the (-) strand. If not using Bioperl, easy to create using:

$chromrev = reverse($chrom);
$chromrev =~tr/ACGTacgt/TGCAtgca/;

Finally, you'll need to figure out a coordinate system for the (-) strand, remembering the convention that start > end, regardless of strand. I'll leave that as an "exercise for the reader" ;-)

ADD COMMENTlink modified 22 months ago by RamRS27k • written 10.2 years ago by Neilfws48k
6
gravatar for Pierre Lindenbaum
10.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

http://www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl

ADD COMMENTlink modified 22 months ago by RamRS27k • written 10.2 years ago by Pierre Lindenbaum129k

nice answer. This tool will be always better than any custom solution or self-made script, and it is already deployed.

ADD REPLYlink written 10.2 years ago by Giovanni M Dall'Olio27k

And it provides results for download as text, unlike many similar web tools.

ADD REPLYlink written 10.2 years ago by Neilfws48k

If anyone is looking to do this outside Arabidopsis on any sequence, you can use the stand-alone PatMatch described in the original article as well. I have set up a repo where you can launch an active Jupyter notebook system where it works via Binder at https://github.com/fomightez/patmatch-binder .

ADD REPLYlink written 2.2 years ago by Wayne360
3
gravatar for Giovanni M Dall'Olio
10.2 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

If you are going to write a script to do it (I recommend you to use Pierre's advice), a good python module is TAMO.

It can scan for motifs represented by score matrixes, where you can define multiple bases per each position. For example, you can say that 20% of the times you expect an A and 80% a G. Moreover, it can print sequence logos and much more.

ADD COMMENTlink written 10.2 years ago by Giovanni M Dall'Olio27k

up 1 for pointing to a python lib

ADD REPLYlink written 10.2 years ago by Haibao Tang3.0k
3
gravatar for brentp
10.2 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

if you're looking for an exact match of that sequence, just using python strings will be quite fast.

for actual motifs, there's also motility which is pretty fast and lets you specify IUPAC motifs or position weight matricies.

ADD COMMENTlink written 10.2 years ago by brentp23k

Cool! maybe it is better than TAMO, which I don't know if has been updated since the last time I used it (2~3 years).

ADD REPLYlink written 10.2 years ago by Giovanni M Dall'Olio27k
2
gravatar for Stew
10.2 years ago by
Stew1.4k
Cambridge
Stew1.4k wrote:

You could also look at the RSAT tools, particular the "genome-scale dna-pattern" tools in the "Pattern Matching" section, they have Arabidopsis there too.

ADD COMMENTlink written 10.2 years ago by Stew1.4k
1

Or you could just grep it

ADD REPLYlink written 10.2 years ago by Stew1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2035 users visited in the last hour