Hi all!

I'm doing an attempt at performing a phylogenetic analysis by previously sketching my samples. I'm sure I'm doing something (many things) wrong because I get some odd results. My approach is as follows

1) I concatenate my forward and reverse read by doing: for i in A B C D do cat "${i}"_R1.fastq.gz "${i}"_R2.fastq.gz > "${i}"_cat.fastq.gz done 2) Then I sketch every concatenated pair of reads: for i in A B C D do mash sketch -m 2 "${i}"_cat.fastq.gz done

3) I finally calculate the distances by using: mash dist *_cat.fastq.gz,msh

Here is my first doubt since my output looks like this:

A B 0.0593303 0 168/1000

A C 0.0621044 0 157/1000

A D 0.0677629 0 137/1000

I see no comparison between A/A (which is pretty obvious, I know) but I also don't see a comparison between B/C and C/D

4) I intended to use this matrix to generate a dendrogram. Will running hclust in R do the trick?

Thanks a lot!

EDIT: I figured out how to get what I was looking for in step 3. So basically I kept step 1 and 2 and then used: mash paste merged_sketches *_cat.fastq.gz.msh

My following step was to infer the distances using the merged sketches as the query and reference:

mash dist -t merged_sketches.msh merged_sketches.msh > distances.txt

I'm still struggling to find the correct way to generate a dendrogram. Do you have any suggestions?

