Question

gene toxin extract by prokka

0

Entering edit mode

3.9 years ago

AbdelAbdel ▴ 30

Hello Briefly , I work on the pangenome of a bacterium that secretes two types of toxins : Toxin A and Toxin B, I m counting on doing Bioinformatics analysis of several strains (145) in order to deduce the mutations in all of them and to know the most severe strains based on the SNP on the genes coding for the two tonxins , in the very begining i have Raw Data (reads ) the first thing i did was assembly (by Spades) and afetr i did annotation (using Prokka) of all the strains (145). after extraction and analysis of the sequences of the genes from ffn file and I noticed that in some genomes the toxins sequences are not completed or fragmented sequences ( after comparaision of the sequence legth with the same sequence on NCBI or with other sequence of other strains )and i don't know is because there's problem in my Raw data or in the assembly step because of bad annotation !! please if anyone have any idea can help to improve every step or any others idea or step i can do to achieve my objective , it ll be great help Thank you, Idea Committee.

annotation Prokka genomics • 1.1k views

ADD COMMENT • link 3.9 years ago by AbdelAbdel ▴ 30

1

Entering edit mode

You need to validate your assemblies, at least for the region you are interesed, you cannot trust that the assembler produced the full genome in one shot without errors.

ADD REPLY • link 3.9 years ago by JC 13k

0

Entering edit mode

Agreed^.

You may need to play with your data and alter assembly parameters (see shovil by Torsten Seemann, the prokka author). You can also provide a database of 'trusted proteins' to prokka, so in case prodigal (or another part of the prokka pipeline) is failing to correctly call the CDS even if the assembly is OK, you might be able to improve by using a starting set of proteins from a reference genome.

Lastly, be open minded that these might also be legitimate CDS breaks (introduction of frameshifts/stop codons etc.). To really scrutinise how much you trust the bases called in the toxin region, visualise the bam/pileup files with something like Tablet.

ADD REPLY • link 3.9 years ago by Joe 21k

0

Entering edit mode

Thanks for the exchange, but how can I provide a database of 'trusted proteins' to prokka ?

ADD REPLY • link 3.9 years ago by AbdelAbdel ▴ 30

0

Entering edit mode

Its a commandline option. Take a look at the documentation (hint --proteins).

ADD REPLY • link 3.9 years ago by Joe 21k