Question: Differences between FPKM and FPKM-UQ files in gene expression analysis
6
gravatar for alcs417
3.5 years ago by
alcs41770
alcs41770 wrote:

Hi guys, I am planning to perform a pan-cancer gene expression analysis across several cancer types. However, I found that the TCGA data portal has been replaced by GDC. After carefully checking the harmonized data in GDC, I am now wondering which file I should use for gene expression analysis, FPKM or FPKM-UQ? What are the differences between the two file types? Previously, I used the files with suffix "rsem.genes.normalized_results" to perform the gene expression analysis. Is FPKM the same as the "*.rsem.genes.normalized_results" file? If so, when shall we use FPKM-UQ? Any help would be really appreciated. Thanks

ADD COMMENTlink modified 3.5 years ago by joshualevipayne70 • written 3.5 years ago by alcs41770
1

FPKM and UQ-FPKM are calculated by GDC just for legacy reason, because ppl used to use FPKM data, and UQ provides a method for normalization. However, for any serious analysis, using count data with DESeq/EdgeR are encouraged.

ADD REPLYlink written 2.6 years ago by Zhenyu Zhang260

Just to add, FPKM can still be really useful. As it normalizes reads to correct for transcript size, it can be useful to correct across potential differences in input RNA quality. Just make sure you know what you are working with. If you are doing some heavy CPM cutoff, obviously FPKM will drastically alter your results in a negative sense. Zhenyu is not wrong, but I felt it worth adding the caveat.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by SpaceMenEatSpacePlants0
1

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLYlink modified 13 months ago • written 19 months ago by Kevin Blighe56k
3
gravatar for joshualevipayne
3.5 years ago by
Zurich
joshualevipayne70 wrote:

This link might also be helpful:

https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/

ADD COMMENTlink written 3.5 years ago by joshualevipayne70
1

To add the actual answer: "The upper quartile FPKM (FPKM-UQ) is a modified FPKM calculation in which the total protein-coding read count is replaced by the 75th percentile read count value for the sample."

ADD REPLYlink written 2.1 years ago by Michael Schubert6.9k
0
gravatar for natasha.sernova
3.5 years ago by
natasha.sernova3.7k
natasha.sernova3.7k wrote:

RPKM (reads per kilobase per million mapped reads)

Upper Quantile (UQ)

See this link:

http://qian.human.cornell.edu/Files/nmeth.3208.pdf

and this paragraph inside:

"Quantification of Ribo-seq and QTI-seq. Reads per kilobase per million reads (RPKM) value was calculated to quantify the ribosome occupancy of mRNA for CHX profiling (ref 20). A window centering the predicted TIS codon (−1, +4) was summarized to represent the abundance of translation initiation signal. To facili tate the comparison between different experimental conditions, we applied upper quartile (UQ) normalization to each predicted TIS codon on the basis of the population of total QTI-seq read count of each individual mRNA (ref 35). The fold changes of translational signal between two experimental conditions for both LTM and CHX profiling data were normalized by fold changes of RNA- seq FPKM values of the corresponding mRNAs".

"In statistics and the theory of probability, quantiles are cutpoints dividing the range of a probability distribution into contiguous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one less quantile than the number of groups created. Thus quartiles are the three cut points that will divide a dataset into four equal-size groups (cf. depicted example). Common quantiles have special names: for instance quartile, decile (creating 10 groups: see below for more). The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points." WIKI

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by natasha.sernova3.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 999 users visited in the last hour