Question: [Blast] PSI-BLAST with multiple-seperate output PSSM file
0
gravatar for phanhuykinh1
3.1 years ago by
phanhuykinh10 wrote:

Hello,
I've encountered a problem when generating PSSM from multiple-sequence query with PSI-BLAST. I want to have a query with single PSSM output for each sequence, or at least a big-single file with every sequence's PSSM output in there.
If it's single sequence query, it would be easy
If it's a multiple sequence query, only the last sequence in the fasta input will be printed the PSSM file. Even though the overall PSI-BLAST output provided all sequence output, with iterations as I specified.
Is there anyway for me to solve this?

EDITTED: I've thought about the idea of running each sequence in a seperate fasta file. But in my case I think it's hard, because if I do so, I need to submit about 15 thousand jobs to my college server, which cost so much resources and affect others' jobs. (end of editted)

My command is something like this:

psiblast -query ./a_multiple_seq_test.fasta -db nr -num_iterations 2 -out_ascii_pssm pssm.chk

Thanks in advance. Any suggestion will be highly appreciated

blast psi-blast pssm • 2.2k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by phanhuykinh10

Have you considered running separate jobs for each of your query sequences?

ADD REPLYlink written 3.1 years ago by genomax87k

I've editted the OP. Thanks for you answer :D

EDITTED: I've thought about the idea of running each sequence in a seperate fasta file. But in my case I think it's hard, because if I do so, I need to submit about 15 thousand jobs to my college server, which cost so much resources and affect others' jobs.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by phanhuykinh10

If you have 15,000 sequences, then there is no other way but to run those many jobs, if you need a PSSM for each one. If there are redundant sequences then you may be able to remove the redundancy with a tool like CD-HIT and use only the remaining unique sequences.

ADD REPLYlink written 3.1 years ago by genomax87k

It's sad to hear that. Because the psiblast output provides output for each sequence, so I just think somehow I can get PSSM for each sequence in a single file. Maybe I have to divide my input in to chunks to submit. Thank you

ADD REPLYlink written 3.1 years ago by phanhuykinh10

I am not exactly sure what you are trying to do (you just need PSSM for each sequence?). Perhaps there is a better simpler way.

ADD REPLYlink written 3.1 years ago by genomax87k

I don't have the answer but I can understand what you are trying to do. When I needed profiles (I forgod that was HMM or PSSM or both) of large proteome, I submitted 100 thousands of jobs to Sun Grid Engine, then SGE got stucked (Ouch!). At that time, I wrote a script to check the number of jobs in que every 5 minutes and if the number of jobs in que was enough small, submitted (a part of) remained jobs.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by fishgolden420

Depending on how stringent your search is and how close the matches in BLAST are, it may not take as long as you think. There are BLAST-like tools that run much faster than NCBI BLAST as well. PSI-BLAST may add a bit of extra compute time, but I've BLAST-ed hundreds of thousands of NGS reads in the past, generating 191million blast hits, and it completed in under a week on our private server, only using about half the available cores (~18)

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by Joe17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour