Dear all,
I am working with metagenomic gene abundance tables derived from the MetaHIT 3.9 million catalog and want to work with genes annotated to KOs in KEGG. The data are provided here and the KEGG annotation is provided as the file "KEGG annotation of the gene catalog."
The annotation file provides gene name, KEGG hit, bit score, fraction of KEGG hit covered, and then the KOs/Pathways/Modules annotated to that gene.
I cannot find the specifics of the annotation in a publication, but it appears that the authors used BLASTP with a gene successfully annotated to a KO if it had a bit score greater than 60 (the minimum in the file is 60.1).
However, the fraction of hit covered ranges from near 0 to 1 i.e. 0 to 100%.
Is it common practice to filter further using the % coverage? Is there precedent or biological meaning to choose a specific threshold - 30%, 50% 90%?
Many thanks in advance.