[Blast] PSI-BLAST with multiple-seperate output PSSM file
0
0
Entering edit mode
6.8 years ago

Hello,
I've encountered a problem when generating PSSM from multiple-sequence query with PSI-BLAST. I want to have a query with single PSSM output for each sequence, or at least a big-single file with every sequence's PSSM output in there.
If it's single sequence query, it would be easy
If it's a multiple sequence query, only the last sequence in the fasta input will be printed the PSSM file. Even though the overall PSI-BLAST output provided all sequence output, with iterations as I specified.
Is there anyway for me to solve this?

EDITTED: I've thought about the idea of running each sequence in a seperate fasta file. But in my case I think it's hard, because if I do so, I need to submit about 15 thousand jobs to my college server, which cost so much resources and affect others' jobs. (end of editted)

My command is something like this:

psiblast -query ./a_multiple_seq_test.fasta -db nr -num_iterations 2 -out_ascii_pssm pssm.chk

Thanks in advance. Any suggestion will be highly appreciated

blast psi-blast pssm • 3.9k views
ADD COMMENT
0
Entering edit mode

Have you considered running separate jobs for each of your query sequences?

ADD REPLY
0
Entering edit mode

I've editted the OP. Thanks for you answer :D

EDITTED: I've thought about the idea of running each sequence in a seperate fasta file. But in my case I think it's hard, because if I do so, I need to submit about 15 thousand jobs to my college server, which cost so much resources and affect others' jobs.

ADD REPLY
0
Entering edit mode

If you have 15,000 sequences, then there is no other way but to run those many jobs, if you need a PSSM for each one. If there are redundant sequences then you may be able to remove the redundancy with a tool like CD-HIT and use only the remaining unique sequences.

ADD REPLY
0
Entering edit mode

It's sad to hear that. Because the psiblast output provides output for each sequence, so I just think somehow I can get PSSM for each sequence in a single file. Maybe I have to divide my input in to chunks to submit. Thank you

ADD REPLY
0
Entering edit mode

I am not exactly sure what you are trying to do (you just need PSSM for each sequence?). Perhaps there is a better simpler way.

ADD REPLY
0
Entering edit mode

I don't have the answer but I can understand what you are trying to do. When I needed profiles (I forgod that was HMM or PSSM or both) of large proteome, I submitted 100 thousands of jobs to Sun Grid Engine, then SGE got stucked (Ouch!). At that time, I wrote a script to check the number of jobs in que every 5 minutes and if the number of jobs in que was enough small, submitted (a part of) remained jobs.

ADD REPLY
0
Entering edit mode

Depending on how stringent your search is and how close the matches in BLAST are, it may not take as long as you think. There are BLAST-like tools that run much faster than NCBI BLAST as well. PSI-BLAST may add a bit of extra compute time, but I've BLAST-ed hundreds of thousands of NGS reads in the past, generating 191million blast hits, and it completed in under a week on our private server, only using about half the available cores (~18)

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6