How To Create A Pssm From Fasta Homologues With Ncbi Blast+ 2.2.23
2
3
Entering edit mode
11.6 years ago

I have a FASTA sequence file with about 10 homologous proteins. What I would like to do is create a PSSM from them and use it to search a transcriptome database.

But how to create it? There is a makemat executable for exactly this task in the NCBI legacy BLAST package which does not seem to have an equivalent in BLAST+.

The new psiblast offers a variety of options (eg. -in_msa, out_pssm) with which it should be possible to create an initial profile, but these two options are dependent on a database or subject sequences (which does not make much sense to me).

What am I missing? Any help is appreciated.

ncbi blast pssm • 12k views
0
Entering edit mode

How can I get alignment.fasta using command line??

0
Entering edit mode

maybe you should consider to open a separate question - you're question will be lost as a reply to this question. However, your problem (generate a MSA, a multi-sequence alignment) is fairly trivial in bioinformatics and there must be many threads around that topic. Programs that can do this are muscle, t-coffee, clustal-w, etc. There might be more modern versions, but the good old stuff will do as well.

6
Entering edit mode
11.6 years ago

Solved.

The correct usage for 2.2.23+ is (-subject produces an error which is fixed in 2.2.24+):

psiblast -db blastdb -in_msa alignment.fasta -out_ascii_pssm pssm.txt


And for 2.2.24+ supplying a subject FASTA file works

psiblast -subject oneseq.fasta -in_msa alignment.fasta -out_ascii_pssm pssm.txt


For both approaches, it does not matter if there is one sequence in db/subject or any subset of the alignment sequences. PSSM output is exactly the same. Note that the query needs to be supplied with in_msa in order to generate a PSSM in one step.

0
Entering edit mode

The PSSM generated using psiblast bases the whole matrix on first sequence in the alignment. Any idea, why?

0
Entering edit mode

I think you should use the -ignore_msa_master option https://www.ncbi.nlm.nih.gov/books/NBK279694/

2
Entering edit mode
11.6 years ago
Rm 8.1k

work around: Create a database of those 10 sequences itself or along with other homologous sequences and then generate the profile (pssm) while searching against the database created.

0
Entering edit mode

since database and subject file are equivalent (with one being a multi-FASTA and the other a BLAST DB obviously) creating a db does not make much sense; even if I create a database first, psiblast does not create the PSSM file for whatever reason

0
Entering edit mode

You were indeed pointing me in the right direction. Thank you for that! I will accept my own answer though because it is more complete (sorry for that). But I modded you up ;-)