Seven million reads were downsampled to 2.3 million. The RPKM >= 1 increased from 12979 to 13247. Meanwhile, RPKM >= 0.1 decreased from 20311 to 18847. This behavior was consistent across four experiments. I'm sorry I cannot provide more information as this experiment was presented by a colleague and I don't have the raw data at the moment. Why is this happening?
From reading biostars bioinformatics for dummys, RPKM = 10^9 * Ni / (sum) * 1/ L Ni is the total number of reads mapped to transcript i and sum is the sum of all transcripts mapped in millions of reads (10^6). 1/L is the length of the transcript in kilobases (10^3). The factor 10^9 comes from using the units millions of reads (10^6, and kilobases, 10^3)
Since RPKM units are 1/length of transcript, does this mean that downsampling preferentially removed shorter transcripts? Or that there were many more short transcripts so they were more likely to be removed upon downsampling than larger transcripts?