Question: Run Multiple Sequences Through Psipred
2
5.6 years ago by
arronslacey230
United Kingdom
arronslacey230 wrote:

Hi, I have psipred working (thanks to some posters on here) and can confirm it works for single sequences - however when I input a fasta file with multiple sequences it only gives the output for the last sequence in the file. does anyone know how to get round this?

here is the runpsipred.pl script:

#!/bin/tcsh

# This is a simple script which will carry out all of the basic steps
# required to make a PSIPRED V3 prediction using BLAST+. Note that it
# assumes that the following programs are in the appropriate directories:
# psiblast - PSIBLAST executable (from NCBI BLAST+)
# psipred - PSIPRED V3 program
# psipass2 - PSIPRED V3 program
# chkparse - PSIPRED V3 program

# NOTE: Experimental PSIPRED script with BLAST+ compatability (DTJ May 2010)

# The name of the BLAST+ data bank
set dbname = uniref90filt

# Where the NCBI BLAST+ programs have been installed
set ncbidir = /usr/local/bin

# Where the PSIPRED V3 programs have been installed
set execdir = ../bin

# Where the PSIPRED V3 data files have been installed
set datadir = ../data

set basename = $1:r set rootname =$basename:t

# Generate a "unique" temporary filename root
set hostid = hostid
set tmproot = psitmp$hostid \cp -f$1 $tmproot.fasta echo "Running PSI-BLAST with sequence"$1 "..."

$ncbidir/psiblast -db$dbname -query $tmproot.fasta -inclusion_ethresh 0.001 -out_pssm$tmproot.chk -num_iterations 3 -num_alignments 0 >& $tmproot.blast if ($status != 0) then
tail $tmproot.blast echo "FATAL: Error whilst running blastpgp - script terminated!" exit$status
endif

echo "Predicting secondary structure..."

$execdir/chkparse$tmproot.chk > $tmproot.mtx if ($status != 0) then
echo "FATAL: Error whilst running chkparse - script terminated!"
exit $status endif echo Pass1 ...$execdir/psipred $tmproot.mtx$datadir/weights.dat $datadir/weights.dat2$datadir/weights.dat3 > $rootname.ss if ($status != 0) then
echo "FATAL: Error whilst running psipred - script terminated!"
exit $status endif echo Pass2 ...$execdir/psipass2 $datadir/weights_p2.dat 1 1.0 1.0$rootname.ss2 $rootname.ss >$rootname.horiz

if ($status != 0) then echo "FATAL: Error whilst running psipass2 - script terminated!" exit$status
endif

# Remove temporary files

echo Cleaning up ...
\rm -f $tmproot.* error.log echo "Final output files:"$rootname.ss2 $rootname.horiz echo "Finished."  can anyone see why it is not storing (or rather overwriting) n-1 outputs? if anyone can point me to where i need to change this that would be great. thanks, arron. blast • 3.8k views ADD COMMENTlink modified 5.6 years ago by Hamish3.0k • written 5.6 years ago by arronslacey230 7 5.6 years ago by Hamish3.0k UK Hamish3.0k wrote: The PSIPRED workflow implemented by the scripts supplied in the distribution, is designed to work with a single input sequence. The simplest way to process multiple sequences is to: 1. Split the input file of sequences into a set of files containing one sequence each. The EMBOSS program seqretsplit may be useful for this. 2. Loop over the input sequence files and run each one through the PSIPRED workflow as implemented in the supplied scripts. For example, using a sh/bash script (and assuming there are no extra '.fasta' files in the current directory): #!/bin/sh seqretsplit -auto inputMultiSeqFile for seqFile in ls -1f *.fasta; do runpsipred$seqFile
done


If necessary you could modify the scripts supplied with PSIPRED to do this for you, but given that wrapping the PSIPRED script is simple (and something you may want to do to ensure the environment is configured correctly) it may not be worth the effort.

For what it is worth... the PSI-BLAST step it the first point where things start to go wrong with multiple sequences as input. The PSI-BLAST checkpoint file can only describe a single search, so when you provide multiple input sequences the checkpoint describes the last search performed, and thus only the result for the last sequence can be extracted as input to the PSIPRED tools. These in turn assume a one-to-one mapping of input to output.