Run Multiple Sequences Through Psipred
Entering edit mode
8.4 years ago
arronslacey ▴ 280

Hi, I have psipred working (thanks to some posters on here) and can confirm it works for single sequences - however when I input a fasta file with multiple sequences it only gives the output for the last sequence in the file. does anyone know how to get round this?

here is the script:


# This is a simple script which will carry out all of the basic steps
# required to make a PSIPRED V3 prediction using BLAST+. Note that it
# assumes that the following programs are in the appropriate directories:
# psiblast - PSIBLAST executable (from NCBI BLAST+)
# psipred - PSIPRED V3 program
# psipass2 - PSIPRED V3 program
# chkparse - PSIPRED V3 program

# NOTE: Experimental PSIPRED script with BLAST+ compatability (DTJ May 2010)

# The name of the BLAST+ data bank
set dbname = uniref90filt

# Where the NCBI BLAST+ programs have been installed
set ncbidir = /usr/local/bin

# Where the PSIPRED V3 programs have been installed
set execdir = ../bin

# Where the PSIPRED V3 data files have been installed
set datadir = ../data

set basename = $1:r
set rootname = $basename:t

# Generate a "unique" temporary filename root
set hostid = `hostid`
set tmproot = psitmp$$$hostid

\cp -f $1 $tmproot.fasta

echo "Running PSI-BLAST with sequence" $1 "..."

$ncbidir/psiblast -db $dbname -query $tmproot.fasta -inclusion_ethresh 0.001 -out_pssm $tmproot.chk -num_iterations 3 -num_alignments 0 >& $tmproot.blast

if ($status != 0) then
    tail $tmproot.blast
    echo "FATAL: Error whilst running blastpgp - script terminated!"
    exit $status

echo "Predicting secondary structure..."

$execdir/chkparse $tmproot.chk > $tmproot.mtx

if ($status != 0) then
    echo "FATAL: Error whilst running chkparse - script terminated!"
    exit $status

echo Pass1 ...

$execdir/psipred $tmproot.mtx $datadir/weights.dat $datadir/weights.dat2 $datadir/weights.dat3 > $

if ($status != 0) then
    echo "FATAL: Error whilst running psipred - script terminated!"
    exit $status

echo Pass2 ...

$execdir/psipass2 $datadir/weights_p2.dat 1 1.0 1.0 $rootname.ss2 $ > $rootname.horiz

if ($status != 0) then
    echo "FATAL: Error whilst running psipass2 - script terminated!"
    exit $status

# Remove temporary files

echo Cleaning up ...
\rm -f $tmproot.* error.log

echo "Final output files:" $rootname.ss2 $rootname.horiz
echo "Finished."

can anyone see why it is not storing (or rather overwriting) n-1 outputs?

if anyone can point me to where i need to change this that would be great.

thanks, arron.

blast • 5.5k views
Entering edit mode

I am trying to run script, the result is : Finished 3 iterations PSI-BLAST usage : psipred weight-file seq-file usage : psipass2 weight-file seq-file DCA DCB PSIPRED failed ... exiting early (It is creating the file but its blank) so, I tried to run psipred command separately : /media/kakarot/ppi/psipred/bin$ psipred /media/kakarot/ppi/genthreader/sequence.iter3.mtx /media/kakarot/ppi/psipred/data/weights.dat /media/kakarot/ppi/psipred/data/weights.dat2 /media/kakarot/ppi/psipred/data/weights.dat3 > /media/kakarot/ppi/genthreader/ But i am getting : psipred: command not found

Entering edit mode
8.4 years ago
Hamish ★ 3.2k

The PSIPRED workflow implemented by the scripts supplied in the distribution, is designed to work with a single input sequence.

The simplest way to process multiple sequences is to:

  1. Split the input file of sequences into a set of files containing one sequence each. The EMBOSS program seqretsplit may be useful for this.
  2. Loop over the input sequence files and run each one through the PSIPRED workflow as implemented in the supplied scripts.

For example, using a sh/bash script (and assuming there are no extra '.fasta' files in the current directory):

seqretsplit -auto inputMultiSeqFile
for seqFile in `ls -1f *.fasta`; do
  runpsipred $seqFile

If necessary you could modify the scripts supplied with PSIPRED to do this for you, but given that wrapping the PSIPRED script is simple (and something you may want to do to ensure the environment is configured correctly) it may not be worth the effort.

For what it is worth... the PSI-BLAST step it the first point where things start to go wrong with multiple sequences as input. The PSI-BLAST checkpoint file can only describe a single search, so when you provide multiple input sequences the checkpoint describes the last search performed, and thus only the result for the last sequence can be extracted as input to the PSIPRED tools. These in turn assume a one-to-one mapping of input to output.

Entering edit mode

thanks hamish, you've been a great help again. i would perhaps move the new files to a sub directory and run psipred from there, otherwise the original fasta file gets selected for psipred as well.


Login before adding your answer.

Traffic: 1729 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6