Blast Motif With Regular Expression ?
5
3
Entering edit mode
14.0 years ago
Markus Krupp ▴ 80

Hi there =)

Today I was asked if there is a possibility to BLAST (blastp) for a specific amino acid MOTIF spanning seven amino acids. However this MOTIF is 'dynamic' at 3 points: position 2 in the motif can be either I,L or V; position 5 can be any amino acid (symbolized with n in MOTIF); position 6 and/or 7 must be D.

MOTIF: Q[I,L,V]DGnDD

Is there a possibility to do some regular expressions to BLAST or is there a special option in BLASTP?

...uuh I have no clue and couldn't give the answer...

May be you got a idea how to BLAST this MOTIF.

Thanks


PROBLEM SOLVED, see below


Khader you're my man :D

The answer to my question is - PROSITE (link)

Prosite offers the opportunity to handle special patterns (Prosite pattern: http://ca.expasy.org/tools/scanprosite/scanprosite-doc.html#pattern_syntax).

However, having a short look at pattern, it does not support the option to do the job for position 6/7 where minimum one of this positions have to be a D. But solving this problem is trivial, I will just write three 'pattern' MOTIFs:

Q-[ILV]-D-G-x-D-D
Q-[ILV]-D-G-x-x-D
Q-[ILV]-D-G-x-D-x

(keep in mind to use the input field 'MOTIF scan' at the left, not 'sequence scan' at the right

BTW. This was my first post at this forum and I'm really surprised about the fast and good qualitative answers :) Thanks a lot!

blast motif • 10k views
ADD COMMENT
0
Entering edit mode

I believe that Blast is not accurate enough for such a short motif..

ADD REPLY
0
Entering edit mode

:), Do we have a 'testimonials' page at BioStar ? We should add Xeroxed_Yeti's comment on BioStar.

ADD REPLY
6
Entering edit mode
14.0 years ago

Have you tried PHI-BLAST? I have used an earlier version (v 2.2.17) for incorporating patterns in to BLAST searches.

If your are search using a known pattern/motif you can retrieve sequence with this motif via PROSITE / ScanProsite.

For search in PROSITE, you need to change motifs according to PROSITE patterns

ADD COMMENT
0
Entering edit mode

Ups, yeah there was something about Prosite ...which I forgot. Thanks for that hind!

ADD REPLY
0
Entering edit mode

phiblast need a sequence and pattern as input while i'm only interested in the moif. Why the inputed sequence was necessary?

ADD REPLY
4
Entering edit mode
14.0 years ago
Will 4.5k

I use motifs quite a bit in my research. BLAST does NOT have the capability to take regular expressions as inputs. I'm assuming you want to use BLAST because you want to find all instances of the regular expression in some large corpus of samples, i.e., all human proteins.

There are two real options:

  1. Since you have a "simple motif" (you only have [] 'sand .'s) you can create every possible instance of your motif and then use an automated method to BLAST all of those. In you case you only have 3 X 20 = 60 possible matches, not that many if you have an automated blast system. If you had a motif like Qn{1,6}DIL then you have WAY too many possibilities to BLAST.
  2. Your other option is to download all sequences that you're looking to search and then use a standard programming language to read in the sequences and use the RegExp library. Personally I'm partial to Python for something like this but its really up to you. The only thing you have to watch-out for is whether you want or don't want to permit overlapping matches. Some languages do not match overlapping regular-expressions (like Python) where others (like Matlab) return even overlapping matches.

Hope that helps

Will

ADD COMMENT
0
Entering edit mode

Hi Will,

thanks for your answer.

This was "just" a question of a biology colleague. For him I, a bioinformatician, was the last hope. However I do not have the time to download the whole genomic sequence and check for that MOTIF. Maybe I have to force some of my diploma students :D ...just joking

I was just curious ;)

ADD REPLY
4
Entering edit mode
14.0 years ago
Neilfws 49k

This is one of those cases where you should choose the right tool for the job, not the tool with which you happen to be most familiar. Sure, you can BLAST peptide queries, but BLAST is not really designed for motif searching; it's designed to maximise quickly local alignments between similar sequences.

The first 2 answers cover it: the PROSITE suite is very good for this type of task, or else a quite simple script that employs regex searching will do the job. I'd encourage your questioner to move beyond what they know (BLAST in this case) and extend themselves, with some good, directed web searching, to find more appropriate tools.

ADD COMMENT
3
Entering edit mode
14.0 years ago
Paulo Nuin ★ 3.7k

Use PSI-Blast, here is the explanation on how it works from NCBI's website:

Iterated profile search methods have led to biologically important observations but, for many years, were quite slow and generally did not provide precise means for evaluating the significance of their results. This limited their utility for systematic mining of the protein databases. The principal design goals in developing the Position-Specific Iterated BLAST (PSI-BLAST) program [10] were speed, simplicity and automatic operation. The procedure PSI-BLAST uses can be summarized in five steps:
(1) PSI-BLAST takes as an input a single protein sequence and compares it to a protein database, using the gapped BLAST program [10].
(2) The program constructs a multiple alignment, and then a profile, from any significant local alignments found. The original query sequence serves as a template for the multiple alignment and profile, whose lengths are identical to that of the query. Different numbers of sequences can be aligned in different template positions.
(3) The profile is compared to the protein database, again seeking local alignments. After a few minor modifications, the BLAST algorithm [10,12] can be used for this directly.
(4) PSI-BLAST estimates the statistical significance of the local alignments found. Because profile substitution scores are constructed to a fixed scale [13], and gap scores remain independent of position, the statistical theory and parameters for gapped BLAST alignments [14] remain applicable to profile alignments [10].
(5) Finally, PSI-BLAST iterates, by returning to step (2), an arbitrary number of times or until convergence.

ADD COMMENT
0
Entering edit mode

however, PSI-BLAST doesn't actually let you input a Regular Expression and get a list of matching proteins. You'd have to guess which instance of the regular expression to input and then hope that you get all of the possible results back: If you searched with "QIDGIDD" you would get different results then if you searched with "QLDGDDD".

ADD REPLY
0
Entering edit mode

Yes and no, because any Blast search will actually give back gapped alignments, so if you use parameters that allow for easy gap insertion you might end up with a series of alignments that contain one of the sequences that you are looking for. But, I agree with you it's not perfect and the whole workload would be on parsing the results.

ADD REPLY
1
Entering edit mode
14.0 years ago
Malcolm.Cook ★ 1.5k

In the case that you are running command-line stand-alone blast against a local mirror of the portions of genbank you want search (not thru a web-page i.e. at NCBI), then, you probably also have access to the seedtop command, which "is one of the programs found in the NCBI standalone blast package"

Seedtop will allow for such searches against your local mirror.

ADD COMMENT
0
Entering edit mode

Hmm let my thing, yeah I think there is an PC somewhere in my workgroup which has a local installation of BLAST ... I will have a look for seedtop.

Thanks Malcom

ADD REPLY

Login before adding your answer.

Traffic: 2108 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6