Entering edit mode
11 months ago
sata72
•
0
I want to have a statistical test between number of snps in each regions of the genes. anyone know which statistical test I have to do, for know that are there any difference between datasets? For example, I want to know are 1) there significantly difference between datasets in promoter region and 2) between which datasets? here dataset 5 are significantly different in compare other datasets! thanks in advance!
region snps_dataset1 snps_dataset2 snps_dataset3 snps_dataset4
promoter 10 10 10 20
introns 20 8 20 25
exons 10 20 20 15
utr5 5 10 5 20
utr3 8 15 8 20
satva72 this question is in need of clarification, as rpolicastro has already said.
However, based on what you've written, the research question itself probably also needs to be further focused as well.
Let's think about just exonic regions. The types of statistical tests used in just these regions (not even addressing the other 4 you mention) are subspecialized. Each has a literature that is itself difficult to master. For example, try reading about testing for dN/dS statistics.
This is just 1 region, and within that 1 region, just 1 type of variant, and even that is very complicated. In other words, getting meaningful conclusions from just dN/dS statistics is not trivial. But, here you are proposing to do that for all kinds of variants in 5 different gene regions. Of course it is possible that such an analysis could be meaningful, but my guess is it needs to be further refined...
Returning to what rpolicastro said, in any event, we cannot help you until we know a lot more about your goals.
Thanks for your comment, the main question here is that distribution of snps that we have in different datasets are different or not?! null hypo is that this distribution is not significant. In each datasets we have somes snps that distributed among promoter, intons, exons, ... . I want to know these distribution among these datasets are significant and different or not. Thanks
ok, but, suppose they are different.
WHY are they different. if they are different, will you proceed from that knowledge directly to some kind of claim, or would use that knowledge to instead frame further experiments? what is your game plan?
there are many, many reasons why the distribution of variants could differ between genes and gene regions. for instance, homology that is present in some exons of some of the genes, but not in others (with other exons of other genes elsewhere in the genome) could drive differences in genetic variant formation .. but would this reflect anything other than a propensity for gene conversion events? unclear. thats just one, tiny, off-the-cuff example. but there are MANY such examples.
can you explain more about how this information would be used?
and, is this from data YOU GENERATED? or are you working from public databases? or what?
the more you tell us, the more we can guide you.
vincent
I think you will need at least three replicates to be able to do statistics.
thanks for you comment. I think its possible if we compare the different dataset?! to show that distribution is different between datasets.
You should explain your experiment in more detail, such as the hypothesis you are testing and the samples you have to test this.