Question: FPKM quantile normalization
2
gravatar for Sara
20 months ago by
Sara60
Sara60 wrote:

Hi all, I have 250 samples from healthy and disease states. I want to integrate gene expression data into metabolic model and do flux balance analysis. Can I use FPKM directly for this work or should I normalize FPKM? For example in some publications I see that some researchers used quantile normalization of FPKM.

Any help is welcome

This question has been asked in bioconductor support too. Here is the link

ADD COMMENTlink modified 20 months ago • written 20 months ago by Sara60

I see that some researchers used quantile normalization of FPKM

Who? Data on FPKM scale is already normalised, but not for cross-sample differences.

ADD REPLYlink written 20 months ago by Kevin Blighe63k

For example see this paper : https://www.pnas.org/content/115/50/E11874.short At the supplementary data, FPKM quantile normalization has been explained.

ADD REPLYlink written 20 months ago by Sara60
1

I'm not familiar with metabolic modeling, but it's not because it has been published before that it is correct.

ADD REPLYlink written 20 months ago by WouterDeCoster44k

Indeed, you may want to check on CrossValidated StackExchange about the feasibility of performing quantile normalisation on FPKM data. It does not feel right to me. A better transformation would be to Z-scores, via zFPKM package in R.

ADD REPLYlink written 20 months ago by Kevin Blighe63k
1

Maybe check quantro, a framework to test if your dataset fulfills the assumptions of quantile normlization.

ADD REPLYlink written 20 months ago by ATpoint36k
2
gravatar for WouterDeCoster
20 months ago by
Belgium
WouterDeCoster44k wrote:

FPKM is not a perfect normalization method. I'd suggest you extract normalized counts from DESeq2.

ADD COMMENTlink written 20 months ago by WouterDeCoster44k

Deseq normalization is a very good normalization methods in several studies but in metabolic modeling and integration of gene expression to metabolic network is not useful. Because it does not Normalize gene length.

ADD REPLYlink written 20 months ago by Sara60
3

FPKM data is not suitable for cross-sample comparisons. DESeq2 can indeed perform adjustment for gene length when you import via tximport.

ADD REPLYlink modified 20 months ago • written 20 months ago by Kevin Blighe63k

Many thanks Kevin, I have a look at tximport, but I confused. How can i do that? I have gene-level counts, not transcripts level.

ADD REPLYlink written 20 months ago by Sara60

I finally found the way to do this. Here is the R code for whoever needs it.

cts <- as.matrix(read.csv("Count_data.txt", row.names = 1, header= TRUE, sep="\t"))
coldata <- read.csv("coldata.txt", row.names = 1, sep="\t")
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design =  ~ condition)
txdb <- makeTxDbFromGFF("gencode.v19.annotation.gtf_withproteinids", format="gtf", circ_seqs=character())
ebg <- exonsBy(txdb, by="gene")
rowRanges(dds) = GRangesList(ebg)
FPKM = fpkm(dds)
ADD REPLYlink modified 20 months ago • written 20 months ago by Sara60
1

You probably need some additional transformation to make it suitable for modeling - the distribution of FPKM (even after inter-library normalization) is not standard!

ADD REPLYlink written 20 months ago by kristoffer.vittingseerup3.4k

Do you mean for example log transformation?

ADD REPLYlink written 20 months ago by Sara60
1

Yep - and some methods also require scaling. Also remember some ML methods have large problems with highly correlated features

ADD REPLYlink written 20 months ago by kristoffer.vittingseerup3.4k

Sorry, I was flying across the ocean. Glad that you got the solution. Where are the gene lengths being stored in your code, though? - presumably from gencode.v19.annotation.gtf_withproteinids

ADD REPLYlink written 20 months ago by Kevin Blighe63k

You asked the question here: https://support.bioconductor.org/p/116021/#116067

When you 'cross-post' to another website, please inform us so that our efforts are not in vain.

ADD REPLYlink written 20 months ago by Kevin Blighe63k

Yes, when I did not receive any reply from you, because I was in a hurry to do my project, I asked my question in Bioconductor support. Ok, I will link both question to each other. Thanks.

ADD REPLYlink written 20 months ago by Sara60
2

You should not expect live-support from a community driven by volunteers.

ADD REPLYlink written 20 months ago by ATpoint36k

Many thanks Kevin. I downloaded the genecode version 19 gtf file and put it in my document. I am so thankful for your valuable advice to use Deseq2 normalization with gene adjustment.

ADD REPLYlink modified 20 months ago • written 20 months ago by Sara60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 776 users visited in the last hour