Question

Run Multiple Sequences Through Psipred

2

Entering edit mode

11.0 years ago

arronslacey ▴ 320

Hi, I have psipred working (thanks to some posters on here) and can confirm it works for single sequences - however when I input a fasta file with multiple sequences it only gives the output for the last sequence in the file. does anyone know how to get round this?

here is the runpsipred.pl script:

#!/bin/tcsh

# This is a simple script which will carry out all of the basic steps
# required to make a PSIPRED V3 prediction using BLAST+. Note that it
# assumes that the following programs are in the appropriate directories:
# psiblast - PSIBLAST executable (from NCBI BLAST+)
# psipred - PSIPRED V3 program
# psipass2 - PSIPRED V3 program
# chkparse - PSIPRED V3 program

# NOTE: Experimental PSIPRED script with BLAST+ compatability (DTJ May 2010)

# The name of the BLAST+ data bank
set dbname = uniref90filt

# Where the NCBI BLAST+ programs have been installed
set ncbidir = /usr/local/bin

# Where the PSIPRED V3 programs have been installed
set execdir = ../bin

# Where the PSIPRED V3 data files have been installed
set datadir = ../data

set basename = $1:r
set rootname = $basename:t

# Generate a "unique" temporary filename root
set hostid = `hostid`
set tmproot = psitmp$$$hostid

\cp -f $1 $tmproot.fasta

echo "Running PSI-BLAST with sequence" $1 "..."

$ncbidir/psiblast -db $dbname -query $tmproot.fasta -inclusion_ethresh 0.001 -out_pssm $tmproot.chk -num_iterations 3 -num_alignments 0 >& $tmproot.blast

if ($status != 0) then
    tail $tmproot.blast
    echo "FATAL: Error whilst running blastpgp - script terminated!"
    exit $status
endif

echo "Predicting secondary structure..."

$execdir/chkparse $tmproot.chk > $tmproot.mtx

if ($status != 0) then
    echo "FATAL: Error whilst running chkparse - script terminated!"
    exit $status
endif

echo Pass1 ...

$execdir/psipred $tmproot.mtx $datadir/weights.dat $datadir/weights.dat2 $datadir/weights.dat3 > $rootname.ss

if ($status != 0) then
    echo "FATAL: Error whilst running psipred - script terminated!"
    exit $status
endif

echo Pass2 ...

$execdir/psipass2 $datadir/weights_p2.dat 1 1.0 1.0 $rootname.ss2 $rootname.ss > $rootname.horiz

if ($status != 0) then
    echo "FATAL: Error whilst running psipass2 - script terminated!"
    exit $status
endif

# Remove temporary files

echo Cleaning up ...
\rm -f $tmproot.* error.log

echo "Final output files:" $rootname.ss2 $rootname.horiz
echo "Finished."

can anyone see why it is not storing (or rather overwriting) n-1 outputs?

if anyone can point me to where i need to change this that would be great.

thanks, arron.

blast • 6.7k views

ADD COMMENT • link updated 11.0 years ago by Hamish ★ 3.2k • written 11.0 years ago by arronslacey ▴ 320

0

Entering edit mode

I am trying to run pgenthreader.sh script, the result is : Finished 3 iterations PSI-BLAST usage : psipred weight-file seq-file usage : psipass2 weight-file seq-file DCA DCB PSIPRED failed ... exiting early (It is creating the sequence.pgen.ss file but its blank) so, I tried to run psipred command separately : /media/kakarot/ppi/psipred/bin$ psipred /media/kakarot/ppi/genthreader/sequence.iter3.mtx /media/kakarot/ppi/psipred/data/weights.dat /media/kakarot/ppi/psipred/data/weights.dat2 /media/kakarot/ppi/psipred/data/weights.dat3 > /media/kakarot/ppi/genthreader/sequence.pgen.ss But i am getting : psipred: command not found

ADD REPLY • link 4.3 years ago by shalinikaushik1293 ▴ 10

score 7 · Answer 1 · 2013-05-08

The PSIPRED workflow implemented by the scripts supplied in the distribution, is designed to work with a single input sequence.

The simplest way to process multiple sequences is to:

Split the input file of sequences into a set of files containing one sequence each. The EMBOSS program seqretsplit may be useful for this.
Loop over the input sequence files and run each one through the PSIPRED workflow as implemented in the supplied scripts.

For example, using a sh/bash script (and assuming there are no extra '.fasta' files in the current directory):

#!/bin/sh
seqretsplit -auto inputMultiSeqFile
for seqFile in `ls -1f *.fasta`; do
  runpsipred $seqFile
done

If necessary you could modify the scripts supplied with PSIPRED to do this for you, but given that wrapping the PSIPRED script is simple (and something you may want to do to ensure the environment is configured correctly) it may not be worth the effort.

For what it is worth... the PSI-BLAST step it the first point where things start to go wrong with multiple sequences as input. The PSI-BLAST checkpoint file can only describe a single search, so when you provide multiple input sequences the checkpoint describes the last search performed, and thus only the result for the last sequence can be extracted as input to the PSIPRED tools. These in turn assume a one-to-one mapping of input to output.