Adjusting Power Based On Length Data
1
1
Entering edit mode
10.3 years ago
moranr ▴ 290

I have taken 100 GC rich genes and 100 AT rich genes from a study. I split each of these subsets into random subgroups of 25 genes for AT and GC rich subsets. e.g. for AT rich genes I have four subsets of 25 genes. I concatenate each gene subset of 25 genes into one alignment and run the analysis on these.

My prblem lies in the lengths of each concatenated subset. For example, GC-rich subset 1 may have an alignment length of 33,000 sites, whereas GC-rich subset 2 may have an alignment length of 39,000 sites. In order to compare the results for the subsets- I need to account for this difference in alignment length. Is there a direction you can point me in to achieve this ? For example, is there a way of adjusting the power of each analysis based on the minimum alignment length (rather than the actual alignment length).

Thank you

statistics p-value • 2.2k views
ADD COMMENT
1
Entering edit mode

I think that we tackled a similar problem in this paper, we called that procedure as the normalization to a "relative size".

ADD REPLY
0
Entering edit mode

this wasn't exactly applicable to my problem, but it is along the lines and helped, thank you.

ADD REPLY
0
Entering edit mode

You first need to explain your objective or problem clearly. You should explain what's your hypothesis and what are you expecting to see. Why you have concatenated 25 genes into one ? I don't think you have designed the analysis correctly.

ADD REPLY
0
Entering edit mode

I have explained my problem. My problem is that I need to compare 4 pieces of data (alignments) but each alignment is different length(as they are made up of different genes). However, the length affects the power of the results and thus the each alignment's results cannot be directly compared as there is a difference in power due to alignment length. The question above gives very little insight to the analysis, the analysis is not my specific problem. The question above is a specific problem which I am asking for help. 25 genes concatenated into is an essential part of the experimental design and there is no question that concatenation needs to be done. However- why 25? it is to allow a direct comparison with another dataset of 25 genes concatenated.

ADD REPLY
1
Entering edit mode
10.3 years ago

How about using bootstrapping of sets of 25 for ALL genes to generate some length-based statistics that you could use for "correction"? Then, you could do the same with your AT- and GC-rich sets and use your bootstrapped data to "correct" for length.

ADD COMMENT

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6