Deleted:Phylogeny using Mash
0
0
Entering edit mode
9 weeks ago
SushiRoll ▴ 30

Hi all!

I'm doing an attempt at performing a phylogenetic analysis by previously sketching my samples. I'm sure I'm doing something (many things) wrong because I get some odd results. My approach is as follows

1) I concatenate my forward and reverse read by doing: for i in A B C D do cat "${i}"_R1.fastq.gz "${i}"_R2.fastq.gz > "${i}"_cat.fastq.gz done

2) Then I sketch every concatenated pair of reads: for i in A B C D do mash sketch -m 2 "${i}"_cat.fastq.gz done

3) I finally calculate the distances by using: mash dist *_cat.fastq.gz,msh

Here is my first doubt since my output looks like this:

A B 0.0593303 0 168/1000

A C 0.0621044 0 157/1000

A D 0.0677629 0 137/1000

I see no comparison between A/A (which is pretty obvious, I know) but I also don't see a comparison between B/C and C/D

4) I intended to use this matrix to generate a dendrogram. Will running hclust in R do the trick?

Thanks a lot!

EDIT: I figured out how to get what I was looking for in step 3. So basically I kept step 1 and 2 and then used: mash paste merged_sketches *_cat.fastq.gz.msh

My following step was to infer the distances using the merged sketches as the query and reference:

mash dist -t merged_sketches.msh merged_sketches.msh > distances.txt

I'm still struggling to find the correct way to generate a dendrogram. Do you have any suggestions?

Mash phylogeny • 116 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 1935 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6