Question: FPKM quantile normalization
2
gravatar for Maryam
4 months ago by
Maryam40
Maryam40 wrote:

Hi all, I have 250 samples from healthy and disease states. I want to integrate gene expression data into metabolic model and do flux balance analysis. Can I use FPKM directly for this work or should I normalize FPKM? For example in some publications I see that some researchers used quantile normalization of FPKM.

Any help is welcome

This question has been asked in bioconductor support too. Here is the link

ADD COMMENTlink modified 4 months ago • written 4 months ago by Maryam40

I see that some researchers used quantile normalization of FPKM

Who? Data on FPKM scale is already normalised, but not for cross-sample differences.

ADD REPLYlink written 4 months ago by Kevin Blighe41k

For example see this paper : https://www.pnas.org/content/115/50/E11874.short At the supplementary data, FPKM quantile normalization has been explained.

ADD REPLYlink written 4 months ago by Maryam40
1

I'm not familiar with metabolic modeling, but it's not because it has been published before that it is correct.

ADD REPLYlink written 4 months ago by WouterDeCoster38k

Indeed, you may want to check on CrossValidated StackExchange about the feasibility of performing quantile normalisation on FPKM data. It does not feel right to me. A better transformation would be to Z-scores, via zFPKM package in R.

ADD REPLYlink written 4 months ago by Kevin Blighe41k
1

Maybe check quantro, a framework to test if your dataset fulfills the assumptions of quantile normlization.

ADD REPLYlink written 4 months ago by ATpoint15k
2
gravatar for WouterDeCoster
4 months ago by
Belgium
WouterDeCoster38k wrote:

FPKM is not a perfect normalization method. I'd suggest you extract normalized counts from DESeq2.

ADD COMMENTlink written 4 months ago by WouterDeCoster38k

Deseq normalization is a very good normalization methods in several studies but in metabolic modeling and integration of gene expression to metabolic network is not useful. Because it does not Normalize gene length.

ADD REPLYlink written 4 months ago by Maryam40
3

FPKM data is not suitable for cross-sample comparisons. DESeq2 can indeed perform adjustment for gene length when you import via tximport.

ADD REPLYlink modified 4 months ago • written 4 months ago by Kevin Blighe41k

Many thanks Kevin, I have a look at tximport, but I confused. How can i do that? I have gene-level counts, not transcripts level.

ADD REPLYlink written 4 months ago by Maryam40

I finally found the way to do this. Here is the R code for whoever needs it.

cts <- as.matrix(read.csv("Count_data.txt", row.names = 1, header= TRUE, sep="\t"))
coldata <- read.csv("coldata.txt", row.names = 1, sep="\t")
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design =  ~ condition)
txdb <- makeTxDbFromGFF("gencode.v19.annotation.gtf_withproteinids", format="gtf", circ_seqs=character())
ebg <- exonsBy(txdb, by="gene")
rowRanges(dds) = GRangesList(ebg)
FPKM = fpkm(dds)
ADD REPLYlink modified 4 months ago • written 4 months ago by Maryam40
1

You probably need some additional transformation to make it suitable for modeling - the distribution of FPKM (even after inter-library normalization) is not standard!

ADD REPLYlink written 4 months ago by kristoffer.vittingseerup1.7k

Do you mean for example log transformation?

ADD REPLYlink written 4 months ago by Maryam40
1

Yep - and some methods also require scaling. Also remember some ML methods have large problems with highly correlated features

ADD REPLYlink written 4 months ago by kristoffer.vittingseerup1.7k

Sorry, I was flying across the ocean. Glad that you got the solution. Where are the gene lengths being stored in your code, though? - presumably from gencode.v19.annotation.gtf_withproteinids

ADD REPLYlink written 4 months ago by Kevin Blighe41k

You asked the question here: https://support.bioconductor.org/p/116021/#116067

When you 'cross-post' to another website, please inform us so that our efforts are not in vain.

ADD REPLYlink written 4 months ago by Kevin Blighe41k

Yes, when I did not receive any reply from you, because I was in a hurry to do my project, I asked my question in Bioconductor support. Ok, I will link both question to each other. Thanks.

ADD REPLYlink written 4 months ago by Maryam40
2

You should not expect live-support from a community driven by volunteers.

ADD REPLYlink written 4 months ago by ATpoint15k

Many thanks Kevin. I downloaded the genecode version 19 gtf file and put it in my document. I am so thankful for your valuable advice to use Deseq2 normalization with gene adjustment.

ADD REPLYlink modified 4 months ago • written 4 months ago by Maryam40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1650 users visited in the last hour