Question: gene toxin extract by prokka
gravatar for Bioinfosenhaji
5 months ago by
Bioinfosenhaji20 wrote:

Hello Briefly , I work on the pangenome of a bacterium that secretes two types of toxins : Toxin A and Toxin B, I m counting on doing Bioinformatics analysis of several strains (145) in order to deduce the mutations in all of them and to know the most severe strains based on the SNP on the genes coding for the two tonxins , in the very begining i have Raw Data (reads ) the first thing i did was assembly (by Spades) and afetr i did annotation (using Prokka) of all the strains (145). after extraction and analysis of the sequences of the genes from ffn file and I noticed that in some genomes the toxins sequences are not completed or fragmented sequences ( after comparaision of the sequence legth with the same sequence on NCBI or with other sequence of other strains )and i don't know is because there's problem in my Raw data or in the assembly step because of bad annotation !! please if anyone have any idea can help to improve every step or any others idea or step i can do to achieve my objective , it ll be great help Thank you, Idea Committee.

genomics prokka annotation • 167 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by Bioinfosenhaji20

You need to validate your assemblies, at least for the region you are interesed, you cannot trust that the assembler produced the full genome in one shot without errors.

ADD REPLYlink written 5 months ago by JC11k


You may need to play with your data and alter assembly parameters (see shovil by Torsten Seemann, the prokka author). You can also provide a database of 'trusted proteins' to prokka, so in case prodigal (or another part of the prokka pipeline) is failing to correctly call the CDS even if the assembly is OK, you might be able to improve by using a starting set of proteins from a reference genome.

Lastly, be open minded that these might also be legitimate CDS breaks (introduction of frameshifts/stop codons etc.). To really scrutinise how much you trust the bases called in the toxin region, visualise the bam/pileup files with something like Tablet.

ADD REPLYlink modified 5 months ago • written 5 months ago by Joe18k

Thanks for the exchange, but how can I provide a database of 'trusted proteins' to prokka ?

ADD REPLYlink written 5 months ago by Bioinfosenhaji20

Its a commandline option. Take a look at the documentation (hint --proteins).

ADD REPLYlink written 5 months ago by Joe18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1856 users visited in the last hour