Deleted:edgeR contrasts for analysis of reciprocal hybrid crosses - averaging, pooling, and comparing groups
0
0
Entering edit mode
15 days ago
Rhiannon • 0

I am analysing a dataset that includes interspecific hybrids and their parent species. As an example let's say we have two parent species, A and B, and their F1 hybrid is C. However, there are two reciprocal cross directions, one where A is the mother and B is the father, and visa versa. I'll call these two reciprocal crosses C1 and C2.

The main question I am interested in is how gene expression in the hybrids differs from the two parents, to identify transgressively expressed genes. We can ask that question separately for C1 and C2, so there are four DE contrasts to test: C1-A, C1-B, C2-A and C2-B. We are also interested in how C1 and C2 differ from each other, which is clearly C1-C2.

The way the study was designed (for sample sizes) assumed that C1 and C2 would be essentially the same and could be pooled together into one group C. That is, for A and B we have 6 replicates each, and for C1 and C2 we have 3 each. Indeed, the C1-C2 contrast results in very few, if any, significant DE genes after FDR correction, although there are many that are significant before FDR. However, I am cautious about combining them into one group C. There are a substantial number of genes that do not show the same results for C1-A and C2-A, and likewise for C1-B and C2-B. What I would like is to find a principled way to decide for which genes we can pool the reciprocal crosses, and for which genes they should be considered separately. The benefit of being able to pool them together for some/most genes is the greater power to detect a smaller effect size with more samples.

So far I have tried testing for DE genes with the contrast (C1+C2)/2-A and (C1+C2)/2-B. Until today I had thought that if the sample size of C1 and C2 was the same, this would be equivalent to pooling them into one group C, but from reading other forum posts I now realise that the way the dispersions are estimated is different for C compared to (C1+C2)/2.

There are quite a few genes that are significantly DE in these averaged group tests that were also significant in one of the individual tests but not the other; eg: a gene is significant in C1-A, not significant in C2-A, and significant in (C1+C2)/2-A. What I would like is a principled way to tell if this kind of result is because the expression level of the gene really does differ between C1 and C2, so they shouldn't be pooled, or if it is the result of low power in the C2-A test. I had considered excluding genes from the joint test either based on (uncorrected) p-value in the C1-C2 contrast or something like the top 10% of absolute log2fold change differences in the C1-C2 contrast. I now wonder whether pooling them together into one group C, so that the group dispersion is estimated on the pool, would essentially take care of this problem. If it does, I would like to be able to identify which genes are affected.

Please also let me know if this approach is fundamentally misguided and I should just consider C1 and C2 separately for everything.

edgeR • 98 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6