I am conducting an allele specific expression analysis that consists of five biological replicates. I have RNA-seq data for five F1 hybrid mice. I have calculated allelic ratios (# of reads from parent 1/ # of reads from parent 2) across all the markers for all five biological replicates. I am not sure how to proceed further. I looked at different paper and all of them merged their data from all biological replicates for a marker. The merging of the biological replicates was supported by high correlation among allelic ratios for a marker across different replicates. My replicates too show decent correlations for allelic ratios but are not good as other studies. The only reason I can think of large biological variation is that I am analyzing hippocampus which has heterogeneous population of cells. I can now move further using one of the two methods explained below:
1) For every marker, I can compare the allelic ratio across all the replicates and can remove the outliers (replicates with discrepant allelic ratios) before merging the reads. This will remove discrepant replicate for that marker and merging will increase the number of reads for that marker and that will let me perform statistical tests for allele specific expression with high confidence and calculate p-value.
2) Other option would be to perform separate statistical test and calculate p-value for all the replicates. So for every marker I will end up with 5 p-values (assuming that each replicate has enough reads for a marker so that a statistical test can be carried on) . Then I will have to either combine these p-values or come up with an arbitrary rule. For example if 3 replicates show significant p-value , then the marker shows allele specific expression.
Which approach sounds better to you guys ? Any other suggestion is welcome too. Thanks.