Question: Rnaseq Fpkm Quantile Normalization
0
gravatar for J.F.Jiang
6.1 years ago by
J.F.Jiang750
China
J.F.Jiang750 wrote:

Hi all,

Here is the situation, i got the gene expression from the RNAseq, with the FPKM value.

However, for some genes, more than 50% sample do not have the value, that is the FPKM = 0.

For this kind of condition, how can we do the quantile normalization?

Thanks

rnaseq normalization fpkm • 7.7k views
ADD COMMENTlink modified 6.1 years ago by Damian Kao15k • written 6.1 years ago by J.F.Jiang750

Why do you want to do quantile normalization? Also: how many genes does this happen to?

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Steve Lianoglou5.0k

Just want to do the eQTL calculation, which need the quantile to make the expression distribution as normal.

However, some genes may happen to have many missing values across the sample, and therefore can not do the quantile normalization process.

My question is how to deal with such a situation, just remove those genes or any other method to find the solution.

ADD REPLYlink written 6.1 years ago by J.F.Jiang750
1

You could consider using a GLM approach such as edgeR for doing your QTL analysis.

ADD REPLYlink written 6.1 years ago by Sean Davis25k

thanks, will consider if the approach i used failed for calculation. Actually, are you the one from NCI? Looked familiar.

ADD REPLYlink written 6.1 years ago by J.F.Jiang750
1
gravatar for Damian Kao
6.1 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

What you are seeing are either 1) you don't have enough sequencing depth to resolve expression of lowly expressed genes or 2) there are actually that many genes that just aren't being expressed. How many reads do you have for the sample?

I would generate a rarification plot of increasing subsets of your reads vs number of genes with tags more than X reads. For example, a plot where you take 1,2,3,4,... million reads and see how many genes have more than 10 reads mapping for each increasing subset.

If you see a plateau, then you probably do have enough read coverage and what you are seeing is probably a biological effect. If no plateau, then you might not have enough read depth.

**edit I might have read your question incorrectly. Are you saying 50% of the genes in your sample have FPKM of 0 or one specific gene has FPKM of 0 in 50% of your samples?

ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by Damian Kao15k

Thanks for your replying. Just some specific transcripts (~2000) have more than 50% missing FPKM

ADD REPLYlink written 6.1 years ago by J.F.Jiang750
0
gravatar for Charles Warden
6.1 years ago by
Charles Warden7.0k
Duarte, CA
Charles Warden7.0k wrote:

I agree - FPKM (or RPKM) expression values are already normalized. Quantile normalization probably isn't necessary, and it is much more common for microarray analysis than RNA-Seq.

If you see a lot of 0 values, then they may already be rounded to a certain number of significant figures. This is actually somewhat good because genes with low coverage can show artificially high fold-change values (if you think about it, some reads are infinitely more than no reads). I would usually just add a value between 0.01 and 1, but rounding down to 0.0 or 0.00 is actually a similar idea.

ADD COMMENTlink written 6.1 years ago by Charles Warden7.0k

I figure out the normalization. Quantile normalization actually is really not necessary for RNAseq gene expression, however, it is quite important when I want to do the eQTL calculation.

So, my focus is how to deal with those transcripts having lots of missing FPKM values.

Thanks for your comment

ADD REPLYlink written 6.1 years ago by J.F.Jiang750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1663 users visited in the last hour