3.9 years ago
Morgan S. ▴ 80

Hey guys,

I have googled this and can not find any advice or answer. Hopefully, I have not overlooked it. I need phmmer to only write the top hit for each query I provide. Currently, it provides over a 100 hits for almost every query, which I have 12,000 of. It would take me way too much time to sort through all this information. Is it probably best to just use Blast instead where I can set this threshold? Not sure if it matters, but I set the evalue to 1e-3.

Thanks!

Are you just looking for the longest substring in the sequence on each iteration?

I don't think I understand your question. I used a protein fasta file made up of all the predicted genes in my genome. When I searched it against the MEROPS database, it gave me over 100 matches for each gene. I only want phmmer to give me the top hit from the MEROPS database, based on the evalue, for each gene, is there a way to do this? I thought --domZ would do the trick, but it didn't. In the manual it says --domZ : Assert that the total number of targets in your searches is <x>, for the purposes of per-domain conditional E-value calculations, rather than the number of targets that passed the reporting thresholds. Here is my script.

phmmer --tblout 1368D_merops2.txt --cpu 20 --domZ 1 -E 1e-3 /query/ /database/

Has anybody figured out an answer to this yet? I am looking for PHMMer equivalent of -max_target_seqs from BLAST.

max_target_seqs
Number of aligned sequences to keep.
Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4).
Not compatible with num_descriptions or num_alignments. Ties are broken by order of sequences in the database.