Question: comparing RNA seq data from different studies using ratios
0
gravatar for david.peeney
5 days ago by
david.peeney0 wrote:

Hi all

I have a question with regards to comparison between RNA seq datasets that have been analyzed using different techniques (in this case FPKM-UQ vs TPM). For reference, the data is cBioportal harvested TCGA data (FPKM-UQ) and GTEx TPMs downloaded from the GTEx portal.

I am not comparing counts between datasets, I am comparing the counts for gene A vs gene X within an individual patient. Basically obtaining a ratio of gene counts. It is this ratio between gene A:gene X that I have been using to compare between datasets.

My question is... Is it reasonable to predict that, in most part, the ratio between gene A and gene X will remain the same (or thereabouts) regardless of whether the counts are FPKM-UQ or TPM?

PS I am aware that it is ideal to re-analyze raw counts using an appropriate between-sample normalization protocol, but this technique allows me to obtain preliminary data from many datasets quickly and identify studies of interest for further in depth analysis.

rna-seq • 122 views
ADD COMMENTlink modified 4 days ago by rrbutleriii40 • written 5 days ago by david.peeney0
0
gravatar for rrbutleriii
4 days ago by
rrbutleriii40
US/Chicago/NorthShore--UChicago
rrbutleriii40 wrote:

Read more in TCGA FPKM-UQ method theory

"The upper quartile FPKM (FPKM-UQ) is a modified FPKM calculation in which the total protein-coding read count is replaced by the 75th percentile read count value for the sample." https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/

The FPKM-UQ data has been quantile normalized and cannot be compared to TPM counts. Try to obtain FPKM to do the FPKM -> TPM conversion here

ADD COMMENTlink modified 2 days ago • written 4 days ago by rrbutleriii40

This doesn't quite answer the question I am asking... I understand that the normalization techniques differ.

The question is, would the ratio between gene X and gene Y significantly differ if the raw counts were processed to give us FPMK-UQ, FPKM, TPM or RSEM.

ADD REPLYlink written 1 day ago by david.peeney0
1

The literal answer to 'would the ratio differ' is quite possibly. Thus my above that quantile normalized data shouldn't be compared in this way.

Quantile normalization is not the same as the other normalization concepts of TPM/FPKM/RPKM. And even for those it is not recommended to compare across types. You can see a quick explanation of QN here. In the example grids, if you were to compare Row A/Row B ratio as geneA/geneB you get different ratios from the un-QN data:

df
#  one two three
#A   5   4     3
#B   2   1     4
#C   3   4     6
#D   4   2     8

df_final
#       one      two    three
#A 5.666667 4.666667 2.000000
#B 2.000000 2.000000 3.000000
#C 3.000000 4.666667 4.666667
#D 4.666667 3.000000 5.666667
ADD REPLYlink modified 8 hours ago • written 8 hours ago by rrbutleriii40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1230 users visited in the last hour