Average Amino Acid Identity (AAI) analysis manually
1
0
Entering edit mode
8 weeks ago
fec2 ▴ 40

Hi all,

I need to perform Average Amino Acid Identity (AAI) analysis for 422 genome using the SLURM system that only allows jobs to run for 3 days. Tool like compareM can't finish the job on time. Therefore I wish to run the analysis using parallel, awk or sed command.

However, I don't really understand how this analysis is working, basically they perform BLAST from the query genome against the reference genome with cut-offs of at least 30% identity and at least 70% coverage. Then they took the top match and performed the reverse search using BLAST with the same cut-offs.

I was previously running an similar analysis called percentage of conserved protein using script like below:

cat allpairs.txt | parallel --colsep ' ' -j 32 \ blastp -query {1} -subject {2} -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {1}_{2}.tsv


which I first save a file contains all the pairs of genome I want to BLAST (allpairs.txt) and perform BLAST using parallel command.

But I don't understand how to perform the reverse search using BLAST with the same cut-offs, is it possible to do it using parallel, awk or sed?

Thank you very much.

Best regards,

Felix

awk parallel sed AAI BLAST • 319 views
1
Entering edit mode
8 weeks ago
Mensur Dlakic ★ 14k

You may want to give a try to recently developed programs that can do this in couple of hours on a simple computer.

For nucleotides:

1
Entering edit mode

Thanks for your suggestion, I have tried few tools, they can't finish 422 genome in 3 day, even I change from BLAST to Prodigal. For online tool, all of them has limitation for number of genome. AAI is needed but not other analysis because this is taxonomy study in genus level. But thanks anyway.

1
Entering edit mode

With all due respect, it is fairly trivial to compare 422 genomes in much less than 3 days, but not necessarily by using BLAST. It doesn't seem like you looked at the links I provided earlier so these also may be for nothing, but here it is just in case:

0
Entering edit mode

In fact, I have tried all the tools for AAI before, I swear. For other comparison tool, the reason I need AAI is that this analysis is sort of standard for bacteria genus delineation. So, reviewer want this analysis. So, for the analysis I mentioned above, we removed the pair in the text file (allpairs.txt) that has already done the BLAST analysis and continue for another 3 day. The reason I posted here is that I dont understand how to do reverse search using for the top match BLAST with the same cut-offs. Thank you.