Question: Differences between Counts and FPKMS
0
gravatar for Ron
3.6 years ago by
Ron1000
United States
Ron1000 wrote:

Hi all,

I have come across a gene, which show a 7 fold increase in Treatment (3 replicates ) vs Control (3 replicates ),while looking at FPKMS.(Took Mean of treament/Mean of Control to get a foldchange)

But doing differential expression(using counts from htseq-count),I am getting a log2foldchange of -0.5 which means the Treatment has lower expression as compared to Controls.

There are couple more such instances that show discordancy between differential expression and FPKMS.

Any suggestions?

Thanks,

Ron

ADD COMMENTlink modified 3.6 years ago by Rob4.2k • written 3.6 years ago by Ron1000
5
gravatar for Rob
3.6 years ago by
Rob4.2k
United States
Rob4.2k wrote:

Hi Ron,

These are completely different measures. As mforde84 points out, raw read counts don't normalize for (1) the fact that a transcript / gene will produce counts that depend on the gene's length and (2) the fact that, in different samples, a particular read count may have different meanings (e.g., imagine a 20M read experiment vs a 40M read experiment). I recently wrote a blog post covering these metrics, what they mean, and some of the differences between them. Harold Pimentel has a great blog post on this topic as well.

ADD COMMENTlink written 3.6 years ago by Rob4.2k
2
gravatar for mforde84
3.6 years ago by
mforde841.2k
mforde841.2k wrote:

Counts are not normalized by either the sequencing depth or gene feature size. FPKM are.

http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by mforde841.2k
4

I'll add that FPKMs are sometimes sequencing-depth normalized in incredibly problematic and unrobust ways. As a general rule, DESeq2/edgeR/limma are to be trusted over anything FPKMs say.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Devon Ryan96k

So,what should be the result I go with in this case? fold change from FPKMS or fold change from DEseq ?

ADD REPLYlink written 3.6 years ago by Ron1000
1

Don't use fold changes from FPKM for any type of quantitative analysis. They will not be properly normalized for comparisons between samples or replicates. You should instead look at fold changes that normalize for these differences. In a "standard" analysis, DESeq / DESeq2 / EdgeR / limma will perform such a normalization.

ADD REPLYlink written 3.6 years ago by Rob4.2k

agreed.

at the moment best practices are:

deseq - library size correction and vst limma - voom cpm calculation with some sort of normalization method (e.g., quantile) edger - not entirely sure, not too familiar with it.

i tend to use tpm alot as well, as it doesn't have the same issues as FPKM and RPKM.

ADD REPLYlink written 3.6 years ago by mforde841.2k

agreed with the agreement ;P. I'd only offer the caveat that, while TPM is universally preferable to F/RPKM, it is still a purely relative abundance metric. For this reason alone, it's not really appropriate to use as is for quantitative comparison between samples / condition .

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Rob4.2k

eh, technically rnaseq is only relative quantification without ERCC spike-ins and even then it's somewhat iffy. FDA SeqC has some good work on this issue. also both tpm and cpm can be used for between sample comparisons because they are normalized per million reads, so that should be fine. but i think both are still biased towards larger transcripts. in the past i've found it's a good idea to look at quantile 0.4 as a good cut off for DEG analysis, or use normalized counts from similar sized intergenic regions as an indication of sequencing noise.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by mforde841.2k
2

True, but spike-ins can exhibit crazy variability / imprecision. Theoretically they are great, but their practical utility is mediated to a large degree by the skills of the person preparing the samples.

ADD REPLYlink written 3.6 years ago by Rob4.2k
1

yep, completely agree. massively parallel sequencing is unfortunately very noisy especially for smaller and low abundance features. also i think this problem scales linearly with sequencing depth. in my opinion it's still a very "experimental" technology.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by mforde841.2k

Okay,another thing I wanted to know is that a norm to see such differences(have you guys come across such differences too in the past)?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Ron1000
1

I've never seen differential expression ... ever ... :D. Just kidding. It depends. If your gene annotation has say 15000 genes, and 14000 are differentially expressed then yea, there's probably a problem. What you want are sanity checks. Do you see something that you should expect to see ... a positive control.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by mforde841.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1535 users visited in the last hour