How To Reduce Fasta Output?
12.1 years ago
Fabian ▴ 50

I'm using fasta's ggsearch and I'm wondering if I can get rid of all the verbose output because with the all vs. all aligments I'm doing, the output files are many gigabytes big.

Ideally, I would like an output like this:

some_id some_other_id e_value

i.e. one alignment per line. I already found the -H flag but this doesn't get me much closer.

12.1 years ago
Bill Pearson ★ 1.0k

For almost all of the searches we do (and we do a lot of them), we use -m 9 -d 0. -d 0 says no alignments. -m 9 usually gives the alignment information you need (percent identity, number of gaps). -m 9c (or -m 9C) gives the actual alignment encoding. Current versions of BioPerl can parse -m 9 output.

If you like BLAST tabular output better, try: -m 8 or -m 8C.

But mostly, use -d 0 (no alignments).

It also makes sense to set a stricter E()-value threshold, particularly for all vs all. At least -E 0.001.

The information in the previous answer is mistaken. -m 9i (the 'i' simply adds percent identity to the normal description line; -m 9 gives much more information, i.e. alignment boundaries) does not prevent the alignments from being displayed (you need -d 0), and it doesn't show only the best alignment (-b 1 would only show the best description line, without that, you will get all the descriptions with E()-values better than the threshold; -b 1 would be inappropriate for all-v-all, since you would just get back the identical match.)

Using -m 9 -d 0 is much more compact than normal FASTA/GGSEARCH output, but still much more verbose than -m 8C (it gives information about the statistics, library size, etc that blast tabular output lacks).

12.1 years ago
Fabian ▴ 50

there is a bioperl script that turns FASTA output into tab separated values:

fasta34 -H -E 1e-5 -m 9 -d 0 QueryFile SearchDatabase | fastam9_to_table >



