Question: RNA-seq RPKM >=1 increases after downsampling but RPKM >= 0.1 decreases after downsampling
gravatar for yarrowmadrona
2.4 years ago by
yarrowmadrona0 wrote:

Seven million reads were downsampled to 2.3 million. The RPKM >= 1 increased from 12979 to 13247. Meanwhile, RPKM >= 0.1 decreased from 20311 to 18847. This behavior was consistent across four experiments. I'm sorry I cannot provide more information as this experiment was presented by a colleague and I don't have the raw data at the moment. Why is this happening?

From reading biostars bioinformatics for dummys, RPKM = 10^9 * Ni / (sum) * 1/ L Ni is the total number of reads mapped to transcript i and sum is the sum of all transcripts mapped in millions of reads (10^6). 1/L is the length of the transcript in kilobases (10^3). The factor 10^9 comes from using the units millions of reads (10^6, and kilobases, 10^3)

Since RPKM units are 1/length of transcript, does this mean that downsampling preferentially removed shorter transcripts? Or that there were many more short transcripts so they were more likely to be removed upon downsampling than larger transcripts?

sequencing rna-seq next-gen • 880 views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by yarrowmadrona0

Thanks. Do you have any resources/guides on how to run a chi-square test in this context?

ADD REPLYlink written 2.4 years ago by yarrowmadrona0

Any chi-square tool will do, there are many online, that you could use. What you are testing for is whether the pairs

12979  20311
13247  18847

could be consistent with a random sampling.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Istvan Albert ♦♦ 82k

Thank you, Very helpful (sorry for the delay in response). Just wanted to check. You are right, looks like random variation

ADD REPLYlink written 2.3 years ago by yarrowmadrona0
gravatar for Istvan Albert
2.4 years ago by
Istvan Albert ♦♦ 82k
University Park, USA
Istvan Albert ♦♦ 82k wrote:

The random sampling means that your numbers will vary, they have to as there is a chance of hitting transcripts unevenly.

Even without running a chi-square test I'd say that your numbers fall well into the expected range of variation.

ADD COMMENTlink written 2.4 years ago by Istvan Albert ♦♦ 82k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1328 users visited in the last hour