Question: Prokka bacteria genome annotation
0
gravatar for agata88
21 months ago by
agata88790
Poland
agata88790 wrote:

Hi all!

I was annotating bacteria genome with prokka. At the end It gave me a results, which are not very understood for me. Maybe somebody more familiar with this program will help?

I have multiple contigs assigned to the same annotation. I run this command:

./prokka --outdir contigs_prokka --kingdom Bacteria --genus X --proteins uniprot_bacteria.fasta --usegenus --evalue 0.01 --rfam --cpu 8 --norrna contigs.fasta &

As a result I have tsv file with annotation including list of contigs and its annotation. For some of results I see that multiple contigs are assigned to the same annotation. For example:

contig1 CDS 1965                Zinc-transporting ATPase OX=224308 GN=zosA PE=1 SV=1
contig2   CDS   918             Zinc-transporting ATPase OX=224308 GN=zosA PE=1 SV=1

I am not sure how to interprate this:

  • whether it's unconnected contigs?
  • whether one sequence presents gene and the rest are pseudogenes?

  • can I take one - the longest - for final annotation and ignore rest, or annotate as potential pseudogenes?

Many thanks for any suggestions. Agata

prokka • 1.3k views
ADD COMMENTlink modified 21 months ago • written 21 months ago by agata88790

Both could be real and just happen to be Zinc-t ATPases. Did you check for sequence redundancy in your contigs before running prokka. e.g. contig2 could be entirely similar to contig1 (and contained within it).

ADD REPLYlink written 21 months ago by genomax84k

Yes, I used CD-HIT, it resulted in 10905 clusters from 10942 contigs.

This is not a single case, most records are multiplied.

ADD REPLYlink modified 21 months ago • written 21 months ago by agata88790
0
gravatar for agata88
21 months ago by
agata88790
Poland
agata88790 wrote:

Hi all!

I have a solution for my question. So, it toured out that my sample is contaminated, that is why I had such huge amount of contigs. After filtering annotation went well. Hope that will help in the future similar dilemmas.

Btw I've filtered contigs by blastn and species specific nt database.

Best,

Agata

ADD COMMENTlink written 21 months ago by agata88790

You may look at this as a solution but having contaminated data going into an assembly is not a good thing. If you choose to submit this assembly to NCBI you may throw someone else off if they use this data for genome comparisons.

ADD REPLYlink written 21 months ago by genomax84k

That is true, I am aware of that. I am going to submit only true data. Thanks.

ADD REPLYlink written 21 months ago by agata88790

Hi Agata,

Kindly send me the running command line of filtering contigs with blastnn?

ADD REPLYlink written 10 months ago by hjafar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1833 users visited in the last hour