Question: How To Create A Pssm From Fasta Homologues With Ncbi Blast+ 2.2.23
3
gravatar for Michael Schubert
10.2 years ago by
Cambridge, UK
Michael Schubert7.0k wrote:

I have a FASTA sequence file with about 10 homologous proteins. What I would like to do is create a PSSM from them and use it to search a transcriptome database.

But how to create it? There is a makemat executable for exactly this task in the NCBI legacy BLAST package which does not seem to have an equivalent in BLAST+.

The new psiblast offers a variety of options (eg. -in_msa, out_pssm) with which it should be possible to create an initial profile, but these two options are dependent on a database or subject sequences (which does not make much sense to me).

What am I missing? Any help is appreciated.

ncbi pssm blast • 10k views
ADD COMMENTlink modified 2.3 years ago by mjavad20120 • written 10.2 years ago by Michael Schubert7.0k

How can I get alignment.fasta using command line??

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by mjavad20120

maybe you should consider to open a separate question - you're question will be lost as a reply to this question. However, your problem (generate a MSA, a multi-sequence alignment) is fairly trivial in bioinformatics and there must be many threads around that topic. Programs that can do this are muscle, t-coffee, clustal-w, etc. There might be more modern versions, but the good old stuff will do as well.

ADD REPLYlink written 2.3 years ago by Carambakaracho2.3k
6
gravatar for Michael Schubert
10.2 years ago by
Cambridge, UK
Michael Schubert7.0k wrote:

Solved.

The correct usage for 2.2.23+ is (-subject produces an error which is fixed in 2.2.24+):

psiblast -db blastdb -in_msa alignment.fasta -out_ascii_pssm pssm.txt

And for 2.2.24+ supplying a subject FASTA file works

psiblast -subject oneseq.fasta -in_msa alignment.fasta -out_ascii_pssm pssm.txt

For both approaches, it does not matter if there is one sequence in db/subject or any subset of the alignment sequences. PSSM output is exactly the same. Note that the query needs to be supplied with in_msa in order to generate a PSSM in one step.

ADD COMMENTlink modified 16 months ago by _r_am32k • written 10.2 years ago by Michael Schubert7.0k

The PSSM generated using psiblast bases the whole matrix on first sequence in the alignment. Any idea, why?

ADD REPLYlink written 7.1 years ago by microbeatic80

I think you should use the -ignore_msa_master option https://www.ncbi.nlm.nih.gov/books/NBK279694/

ADD REPLYlink written 21 months ago by lagartija90
2
gravatar for Rm
10.2 years ago by
Rm8.0k
Danville, PA
Rm8.0k wrote:

work around: Create a database of those 10 sequences itself or along with other homologous sequences and then generate the profile (pssm) while searching against the database created.

ADD COMMENTlink written 10.2 years ago by Rm8.0k

since database and subject file are equivalent (with one being a multi-FASTA and the other a BLAST DB obviously) creating a db does not make much sense; even if I create a database first, psiblast does not create the PSSM file for whatever reason

ADD REPLYlink written 10.2 years ago by Michael Schubert7.0k

You were indeed pointing me in the right direction. Thank you for that! I will accept my own answer though because it is more complete (sorry for that). But I modded you up ;-)

ADD REPLYlink written 10.2 years ago by Michael Schubert7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1078 users visited in the last hour