Question: How to determine if a bacterial de-novo assembly is a new species
gravatar for cmhansen
5.6 years ago by
United States
cmhansen30 wrote:

Good afternoon!

I am new to Biostars and to NGS, so bear with me. I have miseq data from potentially a novel species of bacteria (in the Neisseria genus) and I'd like to do a de novo assembly and annotation. My question is how to best use this data to help us determine whether or not this is a new species. There are whole genomes available for several species within the genus, but not for any of it's closest relatives (based on 16S rDNA and cpn60 sequences).

Thank you

assembly genome • 2.5k views
ADD COMMENTlink modified 5.5 years ago by HG1.1k • written 5.6 years ago by cmhansen30
gravatar for piet
5.6 years ago by
planet earth
piet1.7k wrote:

To distinguish eubacterial species within a genus it is common to compare the sequences of the protein coding genes tuf, rpoB, dnaJ, groEL (cpn60) which belong to the core genome of eubacteria. These genes are present in most bacterial genomes and they usually differ more than 16S- or 23S rRNA and thus differences are easier to recognize even if your data or the sequences deposited in Genbank comprise some errors. And with very few exceptions there is only one copy per genome. 

For a very quick answer, I would take these four sequences from a closely related species, and then map the reads onto these four genes with a fast aligner like bwa. If you get a decent coverage then run a variant calling and determine the consensus of your reads. If coverage is not so good, then map the reads again on the consensus obtained from the first mapping. You should get a much better coverage in the second trial.

Finally create a multi sequence alignment of your consensus and all available sequences of other species in the same genus. The alignment will give you some hints whether it is a new species or only a variant (subspecies) from a species already known.

ADD COMMENTlink written 5.6 years ago by piet1.7k
gravatar for HG
5.5 years ago by
HG1.1k wrote:

Please have a look

ADD COMMENTlink written 5.5 years ago by HG1.1k
gravatar for dago
5.5 years ago by
dago2.6k wrote:

One common and robust way to asses if you are dealing with one specie or more species is calculating the Aevrage Nucleotide Identity (ANI) and considering the Tetra nucleotide usage. You can learn more about that in this paper. To calculate that you could use Jspecies, good and easy to use.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by dago2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1499 users visited in the last hour