Psiblast Fasta Formatted Output
1
0
Entering edit mode
10.4 years ago
Reyhaneh ▴ 530

Hi,

I have a FASTA formatted protein sequence (stored in DsbA.fa) and I would like to use PSI-BLAST (not the web server. The command line in the BLAST+ package) to generate hits. I am using the following command line:

./psiblast -query DsbA.fa -db Proteobacteria -num_iterations=6 -evalue=0.005 -out psiblastDsbAOut -out_pssm=PSSMDsbA

My problem is that I would like a FASTA formatted out put of my result (psiblastDsbAOut) so I can re-align the hits using Clustal.

The output formats supported by psiblast are:

*** Formatting options
 -outfmt <String>
   alignment view options:
     0 = pairwise,
     1 = query-anchored showing identities,
     2 = query-anchored no identities,
     3 = flat query-anchored, show identities,
     4 = flat query-anchored, no identities,
     5 = XML Blast output,
     6 = tabular,
     7 = tabular with comment lines,
     8 = Text ASN.1,
     9 = Binary ASN.1,
    10 = Comma-separated values,
    11 = BLAST archive format (ASN.1) 

   Options 6, 7, and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.

Do you know if I can make the output in FASTA format? if not is Clustal happy to take any of these formats as input?

Thanks

blast+ fasta • 11k views
ADD COMMENT
2
Entering edit mode
10.4 years ago
Hamish ★ 3.2k

Depending on exactly what you are trying to do you can:

  • Derive a multiple sequence alignment directly from the PSI-BLAST result. You can do this using MView, which takes the PSI-BLAST report as input and extracts the sequences from the alignments to construct a multiple sequence alignment. This gives a multiple sequence alignment which is consistent with the PSI-BLAST result, but in which the only sequence that appears is that from the PSI-BLAST local alignments.
  • Extract the hit identifiers from the result, and run a de novo alignment using your favourite multiple sequence alignment program. If using one of the tabular output formats, extracting the identifiers can be as simple as running a grep. Given the list of hit identifiers the original sequences can be retrieved in fasta sequence format from the BLAST database searched using the 'blastdbcmd' program (see "BLAST Command Line Applications User Manual"). This gives a multiple sequence alignment which uses the complete hit sequences, but can contradict the PSI-BLAST local alignments.
  • Use a specialist tool which uses information in the PSI-BLAST result to create anchors to use in the multiple sequence alignment process. An example of this is the DbClustal tool, which extracts anchors from a protein BLAST search and uses these in a ClustalW alignment, to produce a multiple alignment based on the full length sequences with preservation of local alignment regions from the BLAST result.
ADD COMMENT

Login before adding your answer.

Traffic: 3033 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6