I am looking at publicly available RNA-seq data and trying to see how in two species(mouse and human) the epigenomic data varies with the gene expression for highly expressed genes and genes with low expression.I know of papers which set a cutoff for the gene expression (http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000598) beng greater than 0.1 rpkm for it to be considered expressed. Here is how I am taking the top 20% expressed orthologs and the bottom 20% expressed orthologs. I take a cut off for a gene to be expressed (FPKM/RPKM > 0.1) and remove gene pair in either species which are below that cutoff (even if gene from one species is below cutoff and other is not I don't consider that pair). Then I sort the resulting gene pair list based on the expression value in descending order and take top 20% as highly expressed ortholog pairs and bottom 20% as low expressed pairs.
I know its a very naive approach and has its flaws so I would appreciate if someone can suggest me a better more robust statistical approach.
PS: crossposted from (Comparing features of high expression genes vs low expression genes in two species) since that went unanswered.