Question

How do I decide between building a kmer-based or snp-based tree?

1

Entering edit mode

4.5 years ago

gatheringdusthere ▴ 20

I was asked to construct a phylogenetic tree to show if some isolates of a bacterial infection are closely related. There seems to be two types of phylogenetic trees that one can build.

How does one decide that a phylogenetic tree should be constructed based on kmer or on snp calls? Are snp-based trees preferred over kmer-based trees?

snp Kmer-based Phylogenetic trees • 1.5k views

ADD COMMENT • link 4.5 years ago by gatheringdusthere ▴ 20

1

Entering edit mode

Hi, this can't be answered with a simple "do it this way", even after you decided for one method you'll have tons of options. Take the simpler case of the kmers: you're tree may look fundamentally different based on the kmer length. Even worse for the SNP calling algorithms, you'l have many algorithms available, even more settings and filter options. I like kmers because they're fast and I was able to build a tree on mammal genomes in "no time", but this was not for a publication that had to stand expert criticism. My colleague built a very compelling tree of 500 yeast strains based on SNPs, but she spent lots of time on curation and validation of the data.

ADD REPLY • link 4.5 years ago by Carambakaracho ★ 3.2k

1

Entering edit mode

I did think that snp trees are more complicated but kmer trees were easier to conjure but one probably has to justify why certain kmer length is used. I'm so new at this that I truly appreciate being told if the question I ask does not have a simple answer, so thank you!

ADD REPLY • link 4.5 years ago by gatheringdusthere ▴ 20

1

Entering edit mode

Carambakaracho makes some really good points. But I would like to add something that's important IMO. Who's your audience? Chances are the kmer and SNP trees will probably produce similar conclusions (assuming you have good resolution and sequencing data). The important aspect is your target audience, if they're clinicians they may struggle with what a kmer is, how to interpret the results. But SNPs provide a nice concrete way of thinking about differences, these two isolates were 5 SNPs apart but these other ones were 1000. If your organism is clonal, then SNPs are perfect way to go and you dont have to look further into other variants (like indels for example).

In bacterial genomics, SNP based trees are standard and fairly common. I don't agree that kmer is easier than SNPs, they both have their challenges but there are plenty of microbial pipelines that will produce high quality SNP and indel dataset ready for phylogenomic reconstruction. I would stick with maximum parsimony algorithm for.

Let me know if you have any more questions. We use the SPANDx pipeline in our lab.

ADD REPLY • link 4.5 years ago by Mark ★ 1.5k

0

Entering edit mode

Thank you for your reply and suggestion Amar. I still have time to explore, so I'll see if i can find a suitable pipeline for our purpose.

ADD REPLY • link 4.5 years ago by gatheringdusthere ▴ 20