I have RNA-seq data from two species (human & mouse). And I am trying to compare some features (epigenomic/genomic) and how they vary between highly expressed orthologs and low expressed orthologs. I was wondering what would be a statistically robust way to do it. Most papers I have seen in the past tend to put arbitrary cutoffs over gene expression for it to be considered expressed (http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000598). I have seen people do similar things when dividing genes into sets of high and low expression. In case of orthologus genes things are not as straightforward since you can have high expression of one of the gene in the pair and low for another if you are using the same cutoff. My naive workaround was to take a cut off for a gene to be expressed (FPKM/RPKM > 0.1) and remove gene pair in either species which are below that cutoff. Then I sort the gene pair list based on the expression value in descending order and take top 20% as highly expressed ortholog pairs and bottom 20% as low expressed pairs. I know its a very naive approach and has its flaws so I was wondering if someone can suggest me with more robust statistical approaches.