Question: Run Multiple Sequences Through Psipred
2
gravatar for arronslacey
5.6 years ago by
arronslacey230
United Kingdom
arronslacey230 wrote:

Hi, I have psipred working (thanks to some posters on here) and can confirm it works for single sequences - however when I input a fasta file with multiple sequences it only gives the output for the last sequence in the file. does anyone know how to get round this?

here is the runpsipred.pl script:

#!/bin/tcsh

# This is a simple script which will carry out all of the basic steps
# required to make a PSIPRED V3 prediction using BLAST+. Note that it
# assumes that the following programs are in the appropriate directories:
# psiblast - PSIBLAST executable (from NCBI BLAST+)
# psipred - PSIPRED V3 program
# psipass2 - PSIPRED V3 program
# chkparse - PSIPRED V3 program

# NOTE: Experimental PSIPRED script with BLAST+ compatability (DTJ May 2010)

# The name of the BLAST+ data bank
set dbname = uniref90filt

# Where the NCBI BLAST+ programs have been installed
set ncbidir = /usr/local/bin

# Where the PSIPRED V3 programs have been installed
set execdir = ../bin

# Where the PSIPRED V3 data files have been installed
set datadir = ../data

set basename = $1:r
set rootname = $basename:t

# Generate a "unique" temporary filename root
set hostid = `hostid`
set tmproot = psitmp$$$hostid

\cp -f $1 $tmproot.fasta

echo "Running PSI-BLAST with sequence" $1 "..."

$ncbidir/psiblast -db $dbname -query $tmproot.fasta -inclusion_ethresh 0.001 -out_pssm $tmproot.chk -num_iterations 3 -num_alignments 0 >& $tmproot.blast

if ($status != 0) then
    tail $tmproot.blast
    echo "FATAL: Error whilst running blastpgp - script terminated!"
    exit $status
endif

echo "Predicting secondary structure..."

$execdir/chkparse $tmproot.chk > $tmproot.mtx

if ($status != 0) then
    echo "FATAL: Error whilst running chkparse - script terminated!"
    exit $status
endif

echo Pass1 ...

$execdir/psipred $tmproot.mtx $datadir/weights.dat $datadir/weights.dat2 $datadir/weights.dat3 > $rootname.ss

if ($status != 0) then
    echo "FATAL: Error whilst running psipred - script terminated!"
    exit $status
endif

echo Pass2 ...

$execdir/psipass2 $datadir/weights_p2.dat 1 1.0 1.0 $rootname.ss2 $rootname.ss > $rootname.horiz

if ($status != 0) then
    echo "FATAL: Error whilst running psipass2 - script terminated!"
    exit $status
endif

# Remove temporary files

echo Cleaning up ...
\rm -f $tmproot.* error.log

echo "Final output files:" $rootname.ss2 $rootname.horiz
echo "Finished."

can anyone see why it is not storing (or rather overwriting) n-1 outputs?

if anyone can point me to where i need to change this that would be great.

thanks, arron.

blast • 3.8k views
ADD COMMENTlink modified 5.6 years ago by Hamish3.0k • written 5.6 years ago by arronslacey230
7
gravatar for Hamish
5.6 years ago by
Hamish3.0k
UK
Hamish3.0k wrote:

The PSIPRED workflow implemented by the scripts supplied in the distribution, is designed to work with a single input sequence.

The simplest way to process multiple sequences is to:

  1. Split the input file of sequences into a set of files containing one sequence each. The EMBOSS program seqretsplit may be useful for this.
  2. Loop over the input sequence files and run each one through the PSIPRED workflow as implemented in the supplied scripts.

For example, using a sh/bash script (and assuming there are no extra '.fasta' files in the current directory):

#!/bin/sh
seqretsplit -auto inputMultiSeqFile
for seqFile in `ls -1f *.fasta`; do
  runpsipred $seqFile
done

If necessary you could modify the scripts supplied with PSIPRED to do this for you, but given that wrapping the PSIPRED script is simple (and something you may want to do to ensure the environment is configured correctly) it may not be worth the effort.

For what it is worth... the PSI-BLAST step it the first point where things start to go wrong with multiple sequences as input. The PSI-BLAST checkpoint file can only describe a single search, so when you provide multiple input sequences the checkpoint describes the last search performed, and thus only the result for the last sequence can be extracted as input to the PSIPRED tools. These in turn assume a one-to-one mapping of input to output.

ADD COMMENTlink written 5.6 years ago by Hamish3.0k

thanks hamish, you've been a great help again. i would perhaps move the new files to a sub directory and run psipred from there, otherwise the original fasta file gets selected for psipred as well.

ADD REPLYlink written 5.6 years ago by arronslacey230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1758 users visited in the last hour