Entering edit mode
6 months ago
Gumindu
•
0
I have 150 sequences of a particular gene in a dataset. The gene is highly polymorphic, and the sequences are from different studies with different techniques. I have to select 50 out of 150 to analyze polymorphism and selection. What should be the criteria for selection? Should I choose the most diverged 50, the longest 50, just random 50 samples, or any other fair statistical method?
What if I have more than 50 sequences under one criterion? lets say 70 sequences out of 150 are the longest and the same in length, how to select 50 out of those 70?