Question: Clarification for using PSI-BLAST to generate a PSSM (am I doing it right?)
gravatar for DNAlias
7 weeks ago by
DNAlias0 wrote:

If I run psiblast on the command line with the following command:

psiblast -query myfasta.fasta -db mydb -num_iterations 3 -out_pssm mypssm.smp

does this make a pssm based solely on the sequences in myfasta.fasta, or does it create it based on the blast hits?

psiblast • 114 views
ADD COMMENTlink modified 7 weeks ago by Mensur Dlakic5.8k • written 7 weeks ago by DNAlias0

When I run it I see this at the top:

PssmWithParameters ::= { pssm { isProtein TRUE, numRows 28, numColumns 2291, byRow FALSE, query seq { id { local str "Query_147" }, descr { title "419612_0:004b79" },

And "419612_0:004b79" is the name of my last sequence, out of 147 queries. What does this mean? It doesn't mean that the matrix is only based on the last sequence does it?

ADD REPLYlink written 7 weeks ago by DNAlias0
gravatar for Mensur Dlakic
7 weeks ago by
Mensur Dlakic5.8k
Mensur Dlakic5.8k wrote:

From what I can tell, there are at least two things that you are doing wrong.

First, it seems that your query file has multiple sequences. If that's indeed the case, psiblast will search with each sequence individually (and sequentially), and each search will overwrite the results of previous. That is extremely wasteful because it will take lots of time and in the end you will get results only for your last sequence. If you have multiple sequences, split them into individual files and have the results stored into different files. The search will take the same amount of time, but you will end up with results for all sequences rather than just the last one.

Second, the way you formulated the command will save thePSSM after the second iteration. Yes, the PSSM file contains the converted multiple alignment of BLAST hits rather than your starting sequences. However, there is a -save_pssm_after_last_round switch that does exactly what it sounds like. If you don't invoke it, the PSSM will be from the penultimate iteration, which is again wasteful because the results of last iteration will not count for anything. In fact, the following command will produce exactly the same PSSM as yours while running one fewer iteration:

 psiblast -query myfasta.fasta -db mydb -num_iterations 2 -out_pssm mypssm.smp -save_pssm_after_last_round

By the way, what you posted in the other thread:

Warning: [psiblast] Query_1: Composition-based score adjustment conditioned on sequence properties and unconditional composition-based score adjustment is not supported with PSSMs, resetting to default value of standard composition-based statistics

As it says, it is only a warning and you can safely ignore it. When running more than one iteration, psiblast will use the newly created PSSM and therefore can't apply composition-based statistics because those are pre-calculated only for single-iteration searches that use fixed substitution matrices. The warning will not appear if you run the same command as you did but with a single iteration.

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by Mensur Dlakic5.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1343 users visited in the last hour