4.7 years ago by
United States
Hello all,

I am looking to test an alignment-free phylogenetic tree building algorithm I wrote. It can perform both gene and species trees. I have already tested it on a single gene primate tree, but I need some more data to further characterize the algorithm. I know there is a lot of data on TreeBASE, but I am having a hard time pulling data down. Additionally, I am generally unaware of which trees are considered well-resolved.

Any info would help greatly

4.7 years ago by
European Union
You might want to use data sets already used in other papers on alignment-free comparisons. Here you can download the data from andi (shameless self-plug). I also have the roseobacter data set from the spaced words paper. Send me a mail, if you are interested.

I have a followup question about the 109 E. coli ST131 strains. In the assemblies (ordered or not), there are multiple nodes per fasta file. Am I right in assuming that that means there are more than one contig per file (that is, the genome was not closed)?

Yes, those are contigs. A lot of genome projects stay in this state.

