I was annotating bacteria genome with prokka. At the end It gave me a results, which are not very understood for me. Maybe somebody more familiar with this program will help?
I have multiple contigs assigned to the same annotation. I run this command:
./prokka --outdir contigs_prokka --kingdom Bacteria --genus X --proteins uniprot_bacteria.fasta --usegenus --evalue 0.01 --rfam --cpu 8 --norrna contigs.fasta &
As a result I have tsv file with annotation including list of contigs and its annotation. For some of results I see that multiple contigs are assigned to the same annotation. For example:
contig1 CDS 1965 Zinc-transporting ATPase OX=224308 GN=zosA PE=1 SV=1 contig2 CDS 918 Zinc-transporting ATPase OX=224308 GN=zosA PE=1 SV=1
I am not sure how to interprate this:
- whether it's unconnected contigs?
whether one sequence presents gene and the rest are pseudogenes?
can I take one - the longest - for final annotation and ignore rest, or annotate as potential pseudogenes?
Many thanks for any suggestions. Agata