Question: Is neighbor joining the best approach to look at clustering pattern with population genetic data?
0
gravatar for su7880
3.8 years ago by
su78800
su78800 wrote:

Hello,

I am working on a big SNP data set from GBS with over 300 individuals from 34 populations. 34 populations compose three closely related species. I tried various assignment tests to see the pop structure but still wanted to see clustering pattern with different approach. Unfortunately, I am not an expert of tree building. For a starter, I am not sure neighbor joining will give me informative inference on relationships among species and populations. Also, there are many heterozygote individuals for many loci since I am using SNPdata set. Which software takes account for ambiguity codes if I do neighbor joining analysis with my SNP data?

Any kind of answers will be very much appreciated.

Thanks in advance.

snp • 1.4k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by su78800

Hi, It might worth running PCA first on your data. Random Forests are roustabout classifiers, and having reduced the number of your features (SNPs) to manageable size you can build a nice model.

ADD REPLYlink written 3.8 years ago by reza.jabal350
0
gravatar for su7880
3.8 years ago by
su78800
su78800 wrote:

Thanks for the kind answer.

I already tried PCA and various assignments tests like fastStructure, DAPC. However, I still want to look at what NJ does with the data that I have. Any more suggestions?

ADD COMMENTlink written 3.8 years ago by su78800
0
gravatar for Brice Sarver
3.8 years ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

With that kind of data, you'll want to use a distance-based approach like NJ, UPGMA, etc. More sophisticated phylogenetic methods are unlikely to get you an answer in a reasonable amount of time, especially if you're just looking at your data. You can also use a non-phylogenetic method, like hierarchical clustering, if you just want to explore your dataset.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Brice Sarver3.5k

Hi Brice, out of interest I am wondering why you are not recommending random forests? Are they not outperform other models?

ADD REPLYlink written 3.8 years ago by reza.jabal350
1

RF might be useful here. I just honed in on NJ, species/population data, and tree. On a second read, perhaps you weren't explicitly talking about using phylogenetic approaches and a dendrogram, as opposed to a phylogeny, will suffice for what you want.

ADD REPLYlink written 3.8 years ago by Brice Sarver3.5k
0
gravatar for su7880
3.8 years ago by
su78800
su78800 wrote:

Thank you all. I am sorry what is RF? Can you give me some more details for that? Also, do you know software that might actually take account IUPAC ambiguity codes for phylogenetic inferences? I just saw an argument that MrBayes might or might not use the IUPAC ambiguity codes. Is RAxML a good? What do you all use for this kind of question?

Many thanks in advance.

ADD COMMENTlink written 3.8 years ago by su78800
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1670 users visited in the last hour