Question

FPKM quantile normalization

2

Entering edit mode

5.4 years ago

Sara ▴ 60

Hi all, I have 250 samples from healthy and disease states. I want to integrate gene expression data into metabolic model and do flux balance analysis. Can I use FPKM directly for this work or should I normalize FPKM? For example in some publications I see that some researchers used quantile normalization of FPKM.

Any help is welcome

This question has been asked in bioconductor support too. Here is the link

RNA-Seq Flux balance analysis Normalization • 6.0k views

ADD COMMENT • link 5.4 years ago by Sara ▴ 60

0

Entering edit mode

I see that some researchers used quantile normalization of FPKM

Who? Data on FPKM scale is already normalised, but not for cross-sample differences.

ADD REPLY • link 5.4 years ago by Kevin Blighe 87k

0

Entering edit mode

For example see this paper : https://www.pnas.org/content/115/50/E11874.short At the supplementary data, FPKM quantile normalization has been explained.

ADD REPLY • link 5.4 years ago by Sara ▴ 60

1

Entering edit mode

I'm not familiar with metabolic modeling, but it's not because it has been published before that it is correct.

ADD REPLY • link 5.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Indeed, you may want to check on CrossValidated StackExchange about the feasibility of performing quantile normalisation on FPKM data. It does not feel right to me. A better transformation would be to Z-scores, via zFPKM package in R.

ADD REPLY • link 5.4 years ago by Kevin Blighe 87k

1

Entering edit mode

Maybe check quantro, a framework to test if your dataset fulfills the assumptions of quantile normlization.

ADD REPLY • link 5.4 years ago by ATpoint 82k

score 2 · Answer 1 · 2018-12-12

2

Entering edit mode

5.4 years ago

WouterDeCoster 47k

FPKM is not a perfect normalization method. I'd suggest you extract normalized counts from DESeq2.

ADD COMMENT • link 5.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Deseq normalization is a very good normalization methods in several studies but in metabolic modeling and integration of gene expression to metabolic network is not useful. Because it does not Normalize gene length.

ADD REPLY • link 5.4 years ago by Sara ▴ 60

3

Entering edit mode

FPKM data is not suitable for cross-sample comparisons. DESeq2 can indeed perform adjustment for gene length when you import via tximport.

ADD REPLY • link 5.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Many thanks Kevin, I have a look at tximport, but I confused. How can i do that? I have gene-level counts, not transcripts level.

ADD REPLY • link 5.4 years ago by Sara ▴ 60

0

Entering edit mode

I finally found the way to do this. Here is the R code for whoever needs it.

cts <- as.matrix(read.csv("Count_data.txt", row.names = 1, header= TRUE, sep="\t"))
coldata <- read.csv("coldata.txt", row.names = 1, sep="\t")
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design =  ~ condition)
txdb <- makeTxDbFromGFF("gencode.v19.annotation.gtf_withproteinids", format="gtf", circ_seqs=character())
ebg <- exonsBy(txdb, by="gene")
rowRanges(dds) = GRangesList(ebg)
FPKM = fpkm(dds)

ADD REPLY • link 5.4 years ago by Sara ▴ 60

1

Entering edit mode

You probably need some additional transformation to make it suitable for modeling - the distribution of FPKM (even after inter-library normalization) is not standard!

ADD REPLY • link 5.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Do you mean for example log transformation?

ADD REPLY • link 5.4 years ago by Sara ▴ 60

1

Entering edit mode

Yep - and some methods also require scaling. Also remember some ML methods have large problems with highly correlated features

ADD REPLY • link 5.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Sorry, I was flying across the ocean. Glad that you got the solution. Where are the gene lengths being stored in your code, though? - presumably from gencode.v19.annotation.gtf_withproteinids

ADD REPLY • link 5.4 years ago by Kevin Blighe 87k

0

Entering edit mode

You asked the question here: https://support.bioconductor.org/p/116021/#116067

When you 'cross-post' to another website, please inform us so that our efforts are not in vain.

ADD REPLY • link 5.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes, when I did not receive any reply from you, because I was in a hurry to do my project, I asked my question in Bioconductor support. Ok, I will link both question to each other. Thanks.

ADD REPLY • link 5.4 years ago by Sara ▴ 60

2

Entering edit mode

You should not expect live-support from a community driven by volunteers.

ADD REPLY • link 5.4 years ago by ATpoint 82k

0

Entering edit mode

Many thanks Kevin. I downloaded the genecode version 19 gtf file and put it in my document. I am so thankful for your valuable advice to use Deseq2 normalization with gene adjustment.

ADD REPLY • link 5.4 years ago by Sara ▴ 60