Question

comparing RNA seq data from different studies using ratios

0

Entering edit mode

5.5 years ago

david.peeney ▴ 30

Hi all

I have a question with regards to comparison between RNA seq datasets that have been analyzed using different techniques (in this case FPKM-UQ vs TPM). For reference, the data is cBioportal harvested TCGA data (FPKM-UQ) and GTEx TPMs downloaded from the GTEx portal.

I am not comparing counts between datasets, I am comparing the counts for gene A vs gene X within an individual patient. Basically obtaining a ratio of gene counts. It is this ratio between gene A:gene X that I have been using to compare between datasets.

My question is... Is it reasonable to predict that, in most part, the ratio between gene A and gene X will remain the same (or thereabouts) regardless of whether the counts are FPKM-UQ or TPM?

PS I am aware that it is ideal to re-analyze raw counts using an appropriate between-sample normalization protocol, but this technique allows me to obtain preliminary data from many datasets quickly and identify studies of interest for further in depth analysis.

RNA-Seq • 2.8k views

ADD COMMENT • link updated 5.5 years ago by rrbutleriii ▴ 260 • written 5.5 years ago by david.peeney ▴ 30

score 0 · Answer 1 · 2019-01-12

0

Entering edit mode

5.5 years ago

rrbutleriii ▴ 260

Read more in TCGA FPKM-UQ method theory

"The upper quartile FPKM (FPKM-UQ) is a modified FPKM calculation in which the total protein-coding read count is replaced by the 75th percentile read count value for the sample." https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/

The FPKM-UQ data has been quantile normalized and cannot be compared to TPM counts. Try to obtain FPKM to do the FPKM -> TPM conversion here

ADD COMMENT • link 5.5 years ago by rrbutleriii ▴ 260

0

Entering edit mode

This doesn't quite answer the question I am asking... I understand that the normalization techniques differ.

The question is, would the ratio between gene X and gene Y significantly differ if the raw counts were processed to give us FPMK-UQ, FPKM, TPM or RSEM.

ADD REPLY • link 5.5 years ago by david.peeney ▴ 30

1

Entering edit mode

The literal answer to 'would the ratio differ' is quite possibly. Thus my above that quantile normalized data shouldn't be compared in this way.

Quantile normalization is not the same as the other normalization concepts of TPM/FPKM/RPKM. And even for those it is not recommended to compare across types. You can see a quick explanation of QN here. In the example grids, if you were to compare Row A/Row B ratio as geneA/geneB you get different ratios from the un-QN data:

df
#  one two three
#A   5   4     3
#B   2   1     4
#C   3   4     6
#D   4   2     8

df_final
#       one      two    three
#A 5.666667 4.666667 2.000000
#B 2.000000 2.000000 3.000000
#C 3.000000 4.666667 4.666667
#D 4.666667 3.000000 5.666667

ADD REPLY • link 5.5 years ago by rrbutleriii ▴ 260