hello
I am trying to calculate the PSSM but I keep getting error , where is the problem in this code ?
import os
import re
def command_pssm(content, output_file,pssm_file):
os.system('psiblast \
-in_msa 1ak4.fasta \
-db allseq.fasta \
-num_threads 10 \
-num_iterations 3 \
-evalue 0.001 \
-out output_file\
-out_ascii_pssm PSSM.txt' )
input_file = "1ak4.fasta"
output_file = "myoutput.txt"
out_ascii_pssm ="PSSM.txt"
command_pssm(input_file, output_file, out_ascii_pssm)
as an example I get the following error
BLAST query error: CAlnReader::GetSeqEntry(): Seq_entry is not available until after Read()
even when I run this command, I get the same error
psiblast -subject 1ak4.fasta -in_msa allseq.fasta -out_ascii_pssm pssm.txt
BLAST query error: CAlnReader::GetSeqEntry(): Seq_entry is not available until after Read()
My 1ak4.fasta looks like this
MVNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSCFHRIIPGFMCQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFICTAKTEWLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADCGQLE
my allseq.fasta look like this (few lines)
>1ak4_A, dset72, 165 residues
MVNPTVFFDIAVDGEPLGRVSFELFADKVPKTAENFRALSTGEKGFGYKGSCFHRIIPGFMCQGGDFTRHNGTGGKSIYGEKFEDENFILKHTGPGILSMANAGPNTNGSQFFICTAKTEWLDGKHVVFGKVKEGMNIVEAMERFGSRNGKTSKKITIADCGQLE
>1ak4_D, dset72, 145 residues
PIVQNLQGQMVHQAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAAEWDRLHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTHNPPIPVGEIYKRWIILGLNKIVRMY
I also tried
import os
import sys
os.system('psiblast -query 1ak4.fasta -db allseq.fasta -num_iterations=6 -evalue=0.005 -out test_result -out_pssm=PSSMtest_results')
print ("Done !!")
but I got error again
admin$ python code.py
BLAST Database error: No alias or index file found for protein database [allseq.fasta] in search path [/Users/admin/Desktop/myexample::]
Done !!
@Mensur Dlakic I just amended the question, let me know if you need any other input info.
As the
-in_msaswitch implies, it expects a multiple sequence alignment as input, while you are giving it a single sequence. That won't work. If you adjust your single sequence so it is in fasta format (add the first line beginning with>) and use the-queryswitch instead of-in_msa, it should work. That is assuming that yourallseq.fastais already formatted usingmakeblastdb.Mensur Dlakic I am one step close, when I run the makeblastdb , it gives me many output, which one should I use ? I run it like this
makeblastdb -in allseq.fasta -input_type fasta -dbtype prot -out all_seq
All the files created by
makeblastdbare required. In your case they should look something likeallseq.fasta.p??where a question mark can be any letter. Simply give the name of your original file (allseq.fastalike you did before) in (PSI)BLAST search commands and it will automatically look up all the files.PS Just saw that you used the
-outswitch. In that case the file name to use would beall_seq.Mensur Dlakic I put -in_msa alseq.fasta and -db allseq it gives me error like
BLAST query error: CAlnReader::GetSeqEntry(): Seq_entry is not available until after Read()
Please read carefully what I wrote already. You CAN'T use
-in_msaand give a single sequence as input. It has to be the-queryswitch instead.Mensur Dlakic I just thought the similarity of the names could cause the issue, so, I downloaded human database and then made it as my db using the following . I use the above code and input became allseq.fasta , db became Human but still gives me error !!! I really dont know what is the matter with this
makeblastdb -in Human.fasta -input_type fasta -dbtype prot -out Human