Question: comparing RNA seq data from different studies using ratios
0
gravatar for david.peeney
3 months ago by
david.peeney20
david.peeney20 wrote:

Hi all

I have a question with regards to comparison between RNA seq datasets that have been analyzed using different techniques (in this case FPKM-UQ vs TPM). For reference, the data is cBioportal harvested TCGA data (FPKM-UQ) and GTEx TPMs downloaded from the GTEx portal.

I am not comparing counts between datasets, I am comparing the counts for gene A vs gene X within an individual patient. Basically obtaining a ratio of gene counts. It is this ratio between gene A:gene X that I have been using to compare between datasets.

My question is... Is it reasonable to predict that, in most part, the ratio between gene A and gene X will remain the same (or thereabouts) regardless of whether the counts are FPKM-UQ or TPM?

PS I am aware that it is ideal to re-analyze raw counts using an appropriate between-sample normalization protocol, but this technique allows me to obtain preliminary data from many datasets quickly and identify studies of interest for further in depth analysis.

rna-seq • 294 views
ADD COMMENTlink modified 3 months ago by rrbutleriii50 • written 3 months ago by david.peeney20
0
gravatar for rrbutleriii
3 months ago by
rrbutleriii50
US, Chicago
rrbutleriii50 wrote:

Read more in TCGA FPKM-UQ method theory

"The upper quartile FPKM (FPKM-UQ) is a modified FPKM calculation in which the total protein-coding read count is replaced by the 75th percentile read count value for the sample." https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/

The FPKM-UQ data has been quantile normalized and cannot be compared to TPM counts. Try to obtain FPKM to do the FPKM -> TPM conversion here

ADD COMMENTlink modified 3 months ago • written 3 months ago by rrbutleriii50

This doesn't quite answer the question I am asking... I understand that the normalization techniques differ.

The question is, would the ratio between gene X and gene Y significantly differ if the raw counts were processed to give us FPMK-UQ, FPKM, TPM or RSEM.

ADD REPLYlink written 3 months ago by david.peeney20
1

The literal answer to 'would the ratio differ' is quite possibly. Thus my above that quantile normalized data shouldn't be compared in this way.

Quantile normalization is not the same as the other normalization concepts of TPM/FPKM/RPKM. And even for those it is not recommended to compare across types. You can see a quick explanation of QN here. In the example grids, if you were to compare Row A/Row B ratio as geneA/geneB you get different ratios from the un-QN data:

df
#  one two three
#A   5   4     3
#B   2   1     4
#C   3   4     6
#D   4   2     8

df_final
#       one      two    three
#A 5.666667 4.666667 2.000000
#B 2.000000 2.000000 3.000000
#C 3.000000 4.666667 4.666667
#D 4.666667 3.000000 5.666667
ADD REPLYlink modified 3 months ago • written 3 months ago by rrbutleriii50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1052 users visited in the last hour