I want to construct a phylogenic tree on 100 Pseudomonas aeruginosa genomes. Before constructing the tree, I want to first cluster those genomes on the basis of homology and for this purpose, I am using ProteinOrtho5 software. After running the software with synteny option I want to extract protein sequences from the output those only containing all test species and no duplication in each genome. I understand I need to run grab_protein.pl on myproject.poff to do this but how can customize/filter the output before running grab_protein.pl? I followed as following
proteinOrtho out put help required
grep '^4\t4' output.proteinortho didn't print anything for me. As I tested the software on 3 species, I modified it from 3 to 4. Anyone help about the filtering?