comparing RNA seq data from different studies using ratios
1
0
Entering edit mode
5.3 years ago
david.peeney ▴ 30

Hi all

I have a question with regards to comparison between RNA seq datasets that have been analyzed using different techniques (in this case FPKM-UQ vs TPM). For reference, the data is cBioportal harvested TCGA data (FPKM-UQ) and GTEx TPMs downloaded from the GTEx portal.

I am not comparing counts between datasets, I am comparing the counts for gene A vs gene X within an individual patient. Basically obtaining a ratio of gene counts. It is this ratio between gene A:gene X that I have been using to compare between datasets.

My question is... Is it reasonable to predict that, in most part, the ratio between gene A and gene X will remain the same (or thereabouts) regardless of whether the counts are FPKM-UQ or TPM?

PS I am aware that it is ideal to re-analyze raw counts using an appropriate between-sample normalization protocol, but this technique allows me to obtain preliminary data from many datasets quickly and identify studies of interest for further in depth analysis.

RNA-Seq • 2.6k views
ADD COMMENT
0
Entering edit mode
5.3 years ago
rrbutleriii ▴ 260

Read more in TCGA FPKM-UQ method theory

"The upper quartile FPKM (FPKM-UQ) is a modified FPKM calculation in which the total protein-coding read count is replaced by the 75th percentile read count value for the sample." https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/

The FPKM-UQ data has been quantile normalized and cannot be compared to TPM counts. Try to obtain FPKM to do the FPKM -> TPM conversion here

ADD COMMENT
0
Entering edit mode

This doesn't quite answer the question I am asking... I understand that the normalization techniques differ.

The question is, would the ratio between gene X and gene Y significantly differ if the raw counts were processed to give us FPMK-UQ, FPKM, TPM or RSEM.

ADD REPLY
1
Entering edit mode

The literal answer to 'would the ratio differ' is quite possibly. Thus my above that quantile normalized data shouldn't be compared in this way.

Quantile normalization is not the same as the other normalization concepts of TPM/FPKM/RPKM. And even for those it is not recommended to compare across types. You can see a quick explanation of QN here. In the example grids, if you were to compare Row A/Row B ratio as geneA/geneB you get different ratios from the un-QN data:

df
#  one two three
#A   5   4     3
#B   2   1     4
#C   3   4     6
#D   4   2     8

df_final
#       one      two    three
#A 5.666667 4.666667 2.000000
#B 2.000000 2.000000 3.000000
#C 3.000000 4.666667 4.666667
#D 4.666667 3.000000 5.666667
ADD REPLY

Login before adding your answer.

Traffic: 1512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6