Question: calculating the fpkm from htseq counts
gravatar for shenwei1376
2.1 years ago by
shenwei13760 wrote:

Hi everyone,

I am trying to calculating the fpkm values from the htseq-count result. I think I already get the gene.size values for each of the transcript, while the "dds" contains more rows than the gene.size since there are NR##### (non-coding RNAs) in the dds list.

When I was tryin tying to use

 mcols(dds)$basepairs <- gene.size
there is error codes: Error in `[[<-`(`*tmp*`, name, value = list(gene = 1:33398, length = c(6363L, : 33398 elements in value to replace 33420 elements

I am wondering if anybody can help with this. I am not sure if the dds and gene.size is the ordered in the same way! Many thanks!

Wei S

rna-seq fpkm R htseq • 1.1k views
ADD COMMENTlink modified 2.1 years ago by Charles Warden7.8k • written 2.1 years ago by shenwei13760

Are you sure FPKM is what you want? It's not a good normalization method.

ADD REPLYlink written 2.1 years ago by WouterDeCoster44k

I am a beginner of RNASeq. I already got the differentially expressed gene list from deseq2. Now, I am trying to get the expression value of all the genes, so I can do some other analysis just for the control. FPKM is the only the value I know to do this. I am not sure if there is other values I can utilize. Thanks a lot!

ADD REPLYlink written 2.1 years ago by shenwei13760

Just do counts(dds, normalized=TRUE) to access the normalised counts.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Kevin Blighe61k

As per Wouter, it is a bad idea to use FPKM for differential expression comparisons. If you are looking to do downstream analyses from the DESeq2 counts, then obtain the regularised log or variance stabilised counts via rld() and vst(), respectively.

ADD REPLYlink modified 17 months ago • written 2.1 years ago by Kevin Blighe61k

Really thanks a lot!!! I think I already did the counts(dds, normalized=TRUE) during the differentiate expression analysis, but does deseq2 just normalize to the total reading number of the library?

ADD REPLYlink written 2.1 years ago by shenwei13760

DESeq2 does indeed adjust for that ('library size') via the calculation of size factors. In making statistical inferences, it also models and adjusts for dispersion (see A: Clarification on how DSEeq2 Dispersion Curve is Generated ) and fold change differences on low count values.

ADD REPLYlink written 2.1 years ago by Kevin Blighe61k
gravatar for Charles Warden
2.1 years ago by
Charles Warden7.8k
Duarte, CA
Charles Warden7.8k wrote:

While I often find it useful to use programs like edgeR / DESeq2 / limma-voom for p-value calculations, I would say it is also useful to have log2(FPKM + 0.1) values for visualization (QC plots, heatmaps, etc). While the ways to calculate gene length can vary, the log-transformed expression should show more of a normal distribution (at least per-gene) with varying methods of calculating gene length.

Also, you may sometimes find a gene is clearly differentially expressed (which you can see from the direct expression calculation) but not identified with at least one of the methods above, and there may be certain scenarios where calculating p-values with standard methods in R (such as aov() for ANOVA or lm() for linear-regression) using log-transformed FPKM values (or some other normalized expression value) can be a useful option in addition to the count-based methods.

Additionally, the 'edgeR' package has functions to calculate rpkm() and cpm(), which you could then log-transform to create your own figures (if used for a QC plot, this would be done without calculating a differential expression step to calculate a p-value using edgeR or edgeR-robust) .

ADD COMMENTlink written 2.1 years ago by Charles Warden7.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1661 users visited in the last hour