Question

Adjusting Power Based On Length Data

1

Entering edit mode

10.3 years ago

moranr ▴ 290

I have taken 100 GC rich genes and 100 AT rich genes from a study. I split each of these subsets into random subgroups of 25 genes for AT and GC rich subsets. e.g. for AT rich genes I have four subsets of 25 genes. I concatenate each gene subset of 25 genes into one alignment and run the analysis on these.

My prblem lies in the lengths of each concatenated subset. For example, GC-rich subset 1 may have an alignment length of 33,000 sites, whereas GC-rich subset 2 may have an alignment length of 39,000 sites. In order to compare the results for the subsets- I need to account for this difference in alignment length. Is there a direction you can point me in to achieve this ? For example, is there a way of adjusting the power of each analysis based on the minimum alignment length (rather than the actual alignment length).

Thank you

statistics p-value • 2.2k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 10.3 years ago by moranr ▴ 290

1

Entering edit mode

I think that we tackled a similar problem in this paper, we called that procedure as the normalization to a "relative size".

ADD REPLY • link updated 3.1 years ago by Ram 43k • written 10.3 years ago by Pavel Senin ★ 1.9k

0

Entering edit mode

this wasn't exactly applicable to my problem, but it is along the lines and helped, thank you.

ADD REPLY • link updated 3.1 years ago by Ram 43k • written 10.3 years ago by moranr ▴ 290

0

Entering edit mode

You first need to explain your objective or problem clearly. You should explain what's your hypothesis and what are you expecting to see. Why you have concatenated 25 genes into one ? I don't think you have designed the analysis correctly.

ADD REPLY • link updated 3.1 years ago by Ram 43k • written 10.3 years ago by Ashutosh Pandey 12k

0

Entering edit mode

I have explained my problem. My problem is that I need to compare 4 pieces of data (alignments) but each alignment is different length(as they are made up of different genes). However, the length affects the power of the results and thus the each alignment's results cannot be directly compared as there is a difference in power due to alignment length. The question above gives very little insight to the analysis, the analysis is not my specific problem. The question above is a specific problem which I am asking for help. 25 genes concatenated into is an essential part of the experimental design and there is no question that concatenation needs to be done. However- why 25? it is to allow a direct comparison with another dataset of 25 genes concatenated.

ADD REPLY • link updated 3.1 years ago by Ram 43k • written 10.3 years ago by moranr ▴ 290

Ram · Answer 1 · 2014-01-06

1

Entering edit mode

10.3 years ago

Sean Davis 26k

How about using bootstrapping of sets of 25 for ALL genes to generate some length-based statistics that you could use for "correction"? Then, you could do the same with your AT- and GC-rich sets and use your bootstrapped data to "correct" for length.

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 10.3 years ago by Sean Davis 26k