Does the RNAseq data normal if the TPM value 3rd Qutile expression is near 10, but the Max expression are near 20,000
Entering edit mode
3 months ago
alwayshope ▴ 30

Dear all,

May I have your guidance that the gene expression TPM as below is normal?

The most highly expressed seems all correlated with Ribonucle, if filter out, then the highest TPM should be near 9000, does this means the RNAseq library process can have better removal of the Ribonucleoprotein?

RN7SK Gene - RNA Component Of 7SK Nuclear Ribonucleoprotein 
RN7SL1 Gene - RNA Component Of Signal Recognition Particle 7SL1 
RPPH1 Gene - Ribonuclease P RNA Component H1 9000

Many thanks!

Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
Median : 0.313 Median : 0.328 Median : 0.188 Median : 0.462 Median : 0.356
Mean : 16.835 Mean : 17.582 Mean : 17.004 Mean : 17.510 Mean : 18.676
3rd Qu.: 7.427 3rd Qu.: 7.697 3rd Qu.: 5.802 3rd Qu.: 8.930 3rd Qu.: 8.096
Max. :21751.431 Max. :19814.285 Max. :22402.880 Max. :23683.130 Max. :23029.901

Expression RNAseq • 514 views
Entering edit mode
3 months ago

The values you are getting, with a third quartile of appx 10 and a max of appx 20,000, seem quite standard for RNA-Seq analysis. The RNA-seq data are typically heavily skewed (heavy right skew of the TPM values in this case), with a small number of genes having very high expression (eg ribosomal proteins) and the majority of genes having much lower expression. The 3rd quartile TPM being lower than the max TPM proves this skew. You can optimise the RNA-seq library preparation to reduce the representation of abundant transcripts like rRNAs through application of eg poly(A) selection or rRNA depletion. If your highest TPM values drop after filtering out ribonucleoprotein-related genes, then the result may suggest that these techniques were/could be further improved. The high expression levels of these genes are usually caused by their role in cellular functions, not just an artefact of the RNAseq library preparation. Thus, while these genes may dominate the expression profile, their prominence in the data often reflects their biological importance.

All in all, it seems that the TPM values in your data are not unusual and reflect a typical RNA-seq expression profile with a small number of highly expressed genes and a majority of genes with much lower expression.

Entering edit mode

Thanks a lot! Very helpful detailed guidance! Appreciate it a lot!


Login before adding your answer.

Traffic: 2088 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6