Question

How to run PSSPred on multple sequences

0

Entering edit mode

6.0 years ago

Tom ▴ 40

Hi all, I'm using PSSpred link here to predict secondary structures for a set of peptides.

I am running the command on linux. When I run the command on a fasta file with a single sequence in it like this:

 perl PSSpred.pl test.fasta

it works (i.e, it correctly predicts the secondary structure for this one peptide and produces a file with the output).

However, I want the input fasta file to have multiple peptide sequences, and ideally I would like the output to be one predicted secondary structure per peptide.

When I add in three sequences to the test.fasta file, there is still one output file, and all of the sequences have been concatenated to one long string, instead of per peptide.

For example, if the input file is like this:

>seq1
AAA
>seq2
BBB
>seq3
CCC

The output file shows the secondary structure like: AAABBBCCC.

I am trying to somehow get a predicted secondary structure per peptide; either one separate output file per peptide, or all to one file, but with a clear delimiter that I can parse, to show where the secondary structure for one peptide ends and the next begins.

I tried:

 for i in 'cat input_file'; do perl PSSpred.pl $i >> $i.out ; done

but that obviously doesn't work. Does anyone know what the specific command is to run PSSpred on an input fasta file with multiple peptides in it. Thanks

psspred peptide sequence • 1.4k views

ADD COMMENT • link updated 6.0 years ago by Pierre Lindenbaum 161k • written 6.0 years ago by Tom ▴ 40

score 2 · Accepted Answer · 2018-04-09

2

Entering edit mode

6.0 years ago

Pierre Lindenbaum 161k

linearize and loop

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}' jeter.fa | cut -c2- | while read -a T; do echo -e ">${T[0]}\n${T[1]}" > ${T[0]}.fa && perl PSSpred.pl ${T[0]}.fa > ${T[0]}.out ; done

ADD COMMENT • link 6.0 years ago by Pierre Lindenbaum 161k