Question

which statistical test and which package for comapre datasets?

0

Entering edit mode

11 months ago

sata72 • 0

I want to have a statistical test between number of snps in each regions of the genes. anyone know which statistical test I have to do, for know that are there any difference between datasets? For example, I want to know are 1) there significantly difference between datasets in promoter region and 2) between which datasets? here dataset 5 are significantly different in compare other datasets! thanks in advance!

region         snps_dataset1         snps_dataset2         snps_dataset3         snps_dataset4
promoter        10                   10                    10                     20
introns         20                   8                     20                     25                   
exons           10                   20                    20                     15
utr5            5                    10                    5                      20
utr3            8                    15                    8                      20

R statistical-test • 1.0k views

ADD COMMENT • link updated 10 months ago by Ram 44k • written 11 months ago by sata72 • 0

1

Entering edit mode

satva72 this question is in need of clarification, as rpolicastro has already said.

However, based on what you've written, the research question itself probably also needs to be further focused as well.

Let's think about just exonic regions. The types of statistical tests used in just these regions (not even addressing the other 4 you mention) are subspecialized. Each has a literature that is itself difficult to master. For example, try reading about testing for dN/dS statistics.

This is just 1 region, and within that 1 region, just 1 type of variant, and even that is very complicated. In other words, getting meaningful conclusions from just dN/dS statistics is not trivial. But, here you are proposing to do that for all kinds of variants in 5 different gene regions. Of course it is possible that such an analysis could be meaningful, but my guess is it needs to be further refined...

Returning to what rpolicastro said, in any event, we cannot help you until we know a lot more about your goals.

ADD REPLY • link 11 months ago by LauferVA 4.5k

0

Entering edit mode

Thanks for your comment, the main question here is that distribution of snps that we have in different datasets are different or not?! null hypo is that this distribution is not significant. In each datasets we have somes snps that distributed among promoter, intons, exons, ... . I want to know these distribution among these datasets are significant and different or not. Thanks

ADD REPLY • link 11 months ago by sata72 • 0

0

Entering edit mode

ok, but, suppose they are different.

WHY are they different. if they are different, will you proceed from that knowledge directly to some kind of claim, or would use that knowledge to instead frame further experiments? what is your game plan?

there are many, many reasons why the distribution of variants could differ between genes and gene regions. for instance, homology that is present in some exons of some of the genes, but not in others (with other exons of other genes elsewhere in the genome) could drive differences in genetic variant formation .. but would this reflect anything other than a propensity for gene conversion events? unclear. thats just one, tiny, off-the-cuff example. but there are MANY such examples.

can you explain more about how this information would be used?

and, is this from data YOU GENERATED? or are you working from public databases? or what?

the more you tell us, the more we can guide you.

vincent

ADD REPLY • link 11 months ago by LauferVA 4.5k

0

Entering edit mode

I think you will need at least three replicates to be able to do statistics.

ADD REPLY • link 11 months ago by Jeremy ▴ 930

0

Entering edit mode

thanks for you comment. I think its possible if we compare the different dataset?! to show that distribution is different between datasets.

ADD REPLY • link 11 months ago by sata72 • 0

1

Entering edit mode

You should explain your experiment in more detail, such as the hypothesis you are testing and the samples you have to test this.

ADD REPLY • link 11 months ago by rpolicastro 13k