PROKKA annotation problem - wrong reference?
1
1
Entering edit mode
14 months ago
blur ▴ 220

Hi,

I am running PROKKA for the very first time and ran a test on a bacteria (NOT E.coli).

I ran this cmd, expecting to get a gene with gene name:

prokka --outdir assembly_test/mydir_genes --prefix mygenome --addgenes  assembly_test/assembly.fasta


What I get is this [no gene name, mostly]:

>BFMIPPFK_00026 Lipid A biosynthesis myristoyltransferase


I thought maybe I need to set up a specific db for the bacteria I am using - But I couldn't find how to do it in the manual. I saw this cmd - but I was not clear on what files this needs to run.

--setupdb         Index all installed databases


I also saw an option to add --species but it didn't change anything when I ran it (I might have not used it right - I just wrote the name of my baceria)

Any help would be appreciated,

prokka annotation • 857 views
1
Entering edit mode
14 months ago
h.mon 34k

You can improve the annotation by providing a GenBank (or fasta) file with a closely related species annotation with the --proteins flag. Be sure to read Prokka documentation, it is very detailed and well written.

The --species flag you used are just one of the flags to add taxonomic annotation to your genome:

Organism details:
--genus [X]       Genus name (default 'Genus')
--species [X]     Species name (default 'species')
--strain [X]      Strain name (default 'strain')
--plasmid [X]     Plasmid name or identifier (default '')


Something like --genus Escherichia --species coli --strain POO247

0
Entering edit mode

I have also downloaded the Genbank of a reference genome and ran it using --proteins. It gives similar results...

prokka   --outdir /assembly_test/mydir_genes_ref --prefix mygenome --proteins /ref.gb   /assembly_test/assembly.fasta


The Genbank looks like this:

   gene            162628..163530
/gene="lpxC"
/locus_tag="BAL062_00145"
CDS             162628..163530
/gene="lpxC"
/locus_tag="BAL062_00145"
/codon_start=1
/transl_table=11
/product="UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine
deacetylase,UDP-3-O-[3-hydroxymyristoyl]


The resulting names are the product not the gene name...

I also tried using the --addgenes flag with the exact same results:

prokka   --outdir assembly_test/mydir_genes_ref_genes --prefix mygenome --proteins ref.gb --addgenes assembly_test/assembly.fasta


I have checked the CSV file and this contains the right gene name. I assumed the "--addgenes" flag didn't work so tried it again with --compliant and --rawproduct

Could it be a problem with the order of flags? I went over the manual several times - if the answer is there I was not able to find it...

0
Entering edit mode

Did you check the GenBank output (.gbff, if I am not mistaken), or the .gff output? What names are added to these files?

0
Entering edit mode

I can see the gene names I was expecting in the GFF file generated - as well as the CSV, just not the final fnn/faa files :(

prokka  gene    209965  210882  .   +   .   ID=BFMIPPFK_01470_gene;Name=lpxC;gene=**lpxC**;locus_tag=BFMIPPFK_01470


Traffic: 1058 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.