Hi,

I require some help in calculating sample size for my project. I would really appreciate if any statisticians could help. My project is about identifying motifs in a set of DNA sequences. I had hypothesized the following approach to discover motifs in the DNA sequences. My approach is that, say for example if I have a set of 100 DNA sequences, by making subgroups containing 10 sequences per group(random sampling) I expect the motif identification to be very efficient. I would like to know how to statistically make sense of this approach. That is how to statistically calculate how many sequences needs to be in each subgroup and how many subgroups is needed.

Thanks a lot for coming forward to help,

Prabhakaran

Thanks for your reply. In my approach i would be creating for example 100 subgroups containing 10 sequences per group which are selected randomly thus selecting all the 100 DNA sequences for analysis. My idea is to create subgroups and would like to know how may sequences needs to be in each subgroup and the number of subgroups needed for the 100 sequences. And the motifs need not necessarily be present in all the sequences.