I'm using OMA-standalone on my own data as well as data downloaded from NCBI. I'm working with CDS nucleotide/DNA files. However, these CDS files have some gaps in represented as Ns. I believe this is when annotations are crossing contig boundaries or gaps in the genome assembly.
When running OMA standalone I'm getting many warnings like the ones below, and whilst I know they're probably just because of these gaps, I'm worried this will cause erroneous results, due to the X's being misaligned.
WARNING: IUPAC ambiguity characters for DNA/RNA not supported. Will replace them with 'X'
Pat index with 18353224 entries sorted, from "A</seq></e>\n" to "XXXXXXXXXXXXXXXXXXX"
Pat index with 41395238 entries sorted, from "A</seq></e>\n" to "XXXXXXXXXAAAATATATC"
So my main question is, does OMA standalone account for these gaps, should I just leave the Ns in or is there a better way to go about this? And is using the CDS better than using the full genome?
For context, I'm trying to get orthologous groups to help build a species tree and I'm working with insect genomes.
Thank you, Emma