How to determine if a bacterial de-novo assembly is a new species
3
3
Entering edit mode
9.5 years ago
cmhansen ▴ 30

Good afternoon!

I am new to Biostars and to NGS, so bear with me. I have miseq data from potentially a novel species of bacteria (in the Neisseria genus) and I'd like to do a de novo assembly and annotation. My question is how to best use this data to help us determine whether or not this is a new species. There are whole genomes available for several species within the genus, but not for any of it's closest relatives (based on 16S rDNA and cpn60 sequences).

Thank you

Assembly genome • 3.3k views
ADD COMMENT
3
Entering edit mode
9.5 years ago
piet ★ 1.8k

To distinguish eubacterial species within a genus it is common to compare the sequences of the protein coding genes tuf, rpoB, dnaJ, groEL (cpn60) which belong to the core genome of eubacteria. These genes are present in most bacterial genomes and they usually differ more than 16S- or 23S rRNA and thus differences are easier to recognize even if your data or the sequences deposited in Genbank comprise some errors. And with very few exceptions there is only one copy per genome.

For a very quick answer, I would take these four sequences from a closely related species, and then map the reads onto these four genes with a fast aligner like bwa. If you get a decent coverage then run a variant calling and determine the consensus of your reads. If coverage is not so good, then map the reads again on the consensus obtained from the first mapping. You should get a much better coverage in the second trial.

Finally create a multi sequence alignment of your consensus and all available sequences of other species in the same genus. The alignment will give you some hints whether it is a new species or only a variant (subspecies) from a species already known.

ADD COMMENT
1
Entering edit mode
9.4 years ago
HG ★ 1.2k

Please have a look

http://www.biomedcentral.com/1471-2180/12/302

ADD COMMENT
0
Entering edit mode
9.4 years ago
dago ★ 2.8k

One common and robust way to asses if you are dealing with one specie or more species is calculating the Aevrage Nucleotide Identity (ANI) and considering the Tetra nucleotide usage. You can learn more about that in this paper. To calculate that you could use Jspecies, good and easy to use.

ADD COMMENT

Login before adding your answer.

Traffic: 2927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6