[Blast] PSI-BLAST with multiple-seperate output PSSM file
0
0
Entering edit mode
4.3 years ago

Hello,
I've encountered a problem when generating PSSM from multiple-sequence query with PSI-BLAST. I want to have a query with single PSSM output for each sequence, or at least a big-single file with every sequence's PSSM output in there.
If it's single sequence query, it would be easy
If it's a multiple sequence query, only the last sequence in the fasta input will be printed the PSSM file. Even though the overall PSI-BLAST output provided all sequence output, with iterations as I specified.
Is there anyway for me to solve this?

EDITTED: I've thought about the idea of running each sequence in a seperate fasta file. But in my case I think it's hard, because if I do so, I need to submit about 15 thousand jobs to my college server, which cost so much resources and affect others' jobs. (end of editted)

My command is something like this:

psiblast -query ./a_multiple_seq_test.fasta -db nr -num_iterations 2 -out_ascii_pssm pssm.chk


Thanks in advance. Any suggestion will be highly appreciated

blast psi-blast pssm • 2.9k views
0
Entering edit mode

Have you considered running separate jobs for each of your query sequences?

0
Entering edit mode

I've editted the OP. Thanks for you answer :D

EDITTED: I've thought about the idea of running each sequence in a seperate fasta file. But in my case I think it's hard, because if I do so, I need to submit about 15 thousand jobs to my college server, which cost so much resources and affect others' jobs.

0
Entering edit mode

If you have 15,000 sequences, then there is no other way but to run those many jobs, if you need a PSSM for each one. If there are redundant sequences then you may be able to remove the redundancy with a tool like CD-HIT and use only the remaining unique sequences.

0
Entering edit mode

It's sad to hear that. Because the psiblast output provides output for each sequence, so I just think somehow I can get PSSM for each sequence in a single file. Maybe I have to divide my input in to chunks to submit. Thank you

0
Entering edit mode

I am not exactly sure what you are trying to do (you just need PSSM for each sequence?). Perhaps there is a better simpler way.

0
Entering edit mode

I don't have the answer but I can understand what you are trying to do. When I needed profiles (I forgod that was HMM or PSSM or both) of large proteome, I submitted 100 thousands of jobs to Sun Grid Engine, then SGE got stucked (Ouch!). At that time, I wrote a script to check the number of jobs in que every 5 minutes and if the number of jobs in que was enough small, submitted (a part of) remained jobs.

0
Entering edit mode

Depending on how stringent your search is and how close the matches in BLAST are, it may not take as long as you think. There are BLAST-like tools that run much faster than NCBI BLAST as well. PSI-BLAST may add a bit of extra compute time, but I've BLAST-ed hundreds of thousands of NGS reads in the past, generating 191million blast hits, and it completed in under a week on our private server, only using about half the available cores (~18)