Question: FPKM quantile normalization
2
gravatar for Maryam
14 months ago by
Maryam40
Maryam40 wrote:

Hi all, I have 250 samples from healthy and disease states. I want to integrate gene expression data into metabolic model and do flux balance analysis. Can I use FPKM directly for this work or should I normalize FPKM? For example in some publications I see that some researchers used quantile normalization of FPKM.

Any help is welcome

This question has been asked in bioconductor support too. Here is the link

ADD COMMENTlink modified 14 months ago • written 14 months ago by Maryam40

I see that some researchers used quantile normalization of FPKM

Who? Data on FPKM scale is already normalised, but not for cross-sample differences.

ADD REPLYlink written 14 months ago by Kevin Blighe54k

For example see this paper : https://www.pnas.org/content/115/50/E11874.short At the supplementary data, FPKM quantile normalization has been explained.

ADD REPLYlink written 14 months ago by Maryam40
1

I'm not familiar with metabolic modeling, but it's not because it has been published before that it is correct.

ADD REPLYlink written 14 months ago by WouterDeCoster43k

Indeed, you may want to check on CrossValidated StackExchange about the feasibility of performing quantile normalisation on FPKM data. It does not feel right to me. A better transformation would be to Z-scores, via zFPKM package in R.

ADD REPLYlink written 14 months ago by Kevin Blighe54k
1

Maybe check quantro, a framework to test if your dataset fulfills the assumptions of quantile normlization.

ADD REPLYlink written 14 months ago by ATpoint29k
2
gravatar for WouterDeCoster
14 months ago by
Belgium
WouterDeCoster43k wrote:

FPKM is not a perfect normalization method. I'd suggest you extract normalized counts from DESeq2.

ADD COMMENTlink written 14 months ago by WouterDeCoster43k

Deseq normalization is a very good normalization methods in several studies but in metabolic modeling and integration of gene expression to metabolic network is not useful. Because it does not Normalize gene length.

ADD REPLYlink written 14 months ago by Maryam40
3

FPKM data is not suitable for cross-sample comparisons. DESeq2 can indeed perform adjustment for gene length when you import via tximport.

ADD REPLYlink modified 14 months ago • written 14 months ago by Kevin Blighe54k

Many thanks Kevin, I have a look at tximport, but I confused. How can i do that? I have gene-level counts, not transcripts level.

ADD REPLYlink written 14 months ago by Maryam40

I finally found the way to do this. Here is the R code for whoever needs it.

cts <- as.matrix(read.csv("Count_data.txt", row.names = 1, header= TRUE, sep="\t"))
coldata <- read.csv("coldata.txt", row.names = 1, sep="\t")
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design =  ~ condition)
txdb <- makeTxDbFromGFF("gencode.v19.annotation.gtf_withproteinids", format="gtf", circ_seqs=character())
ebg <- exonsBy(txdb, by="gene")
rowRanges(dds) = GRangesList(ebg)
FPKM = fpkm(dds)
ADD REPLYlink modified 14 months ago • written 14 months ago by Maryam40
1

You probably need some additional transformation to make it suitable for modeling - the distribution of FPKM (even after inter-library normalization) is not standard!

ADD REPLYlink written 14 months ago by kristoffer.vittingseerup3.0k

Do you mean for example log transformation?

ADD REPLYlink written 14 months ago by Maryam40
1

Yep - and some methods also require scaling. Also remember some ML methods have large problems with highly correlated features

ADD REPLYlink written 14 months ago by kristoffer.vittingseerup3.0k

Sorry, I was flying across the ocean. Glad that you got the solution. Where are the gene lengths being stored in your code, though? - presumably from gencode.v19.annotation.gtf_withproteinids

ADD REPLYlink written 14 months ago by Kevin Blighe54k

You asked the question here: https://support.bioconductor.org/p/116021/#116067

When you 'cross-post' to another website, please inform us so that our efforts are not in vain.

ADD REPLYlink written 14 months ago by Kevin Blighe54k

Yes, when I did not receive any reply from you, because I was in a hurry to do my project, I asked my question in Bioconductor support. Ok, I will link both question to each other. Thanks.

ADD REPLYlink written 14 months ago by Maryam40
2

You should not expect live-support from a community driven by volunteers.

ADD REPLYlink written 14 months ago by ATpoint29k

Many thanks Kevin. I downloaded the genecode version 19 gtf file and put it in my document. I am so thankful for your valuable advice to use Deseq2 normalization with gene adjustment.

ADD REPLYlink modified 14 months ago • written 14 months ago by Maryam40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1080 users visited in the last hour