Question: Problem in selecting threshold value for FPKM
0
gravatar for BIOTECH.DEEPTI911
14 months ago by
BIOTECH.DEEPTI9110 wrote:

Hey Folks,

I am analyzing the RNA-seq data of cell line. I am comparing two treatment conditions with control and with respect to each other. I found very high Log2 fold change values for differentially expressed genes because of very low values of FPKM in control sample. I have consulted some of papers in which they set the threshold for example, >1 FPKM values for differential expression. How to set this threshold. What is the basic criteria of selecting this threshold value. How the density plots are helpful in this regard. Looking forward for the best possible answers.

Thank You

rna-seq • 806 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by BIOTECH.DEEPTI9110

Hello BIOTECH.DEEPTI911!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 14 months ago • written 14 months ago by lakhujanivijay4.3k

Dear Vijay,

Yes I completely agree that similar questions have been asked by others but the details they have provided is different form my question. I need the explanation for density plots so for this I have mentioned my work plan. It would be great if you will open this coversation.

Regards,

Deepti Mittal Ph.D. Research Scholar C/O Dr. Gautam Kaul Division of Biochemistry NDRI, Karnal.

ADD REPLYlink written 14 months ago by BIOTECH.DEEPTI9110

I have opened the question again

ADD REPLYlink written 14 months ago by lakhujanivijay4.3k

Thank you Kevin for your comment. It is very helpful to increase my knowledge, but I want to know that what if I will take RPKM values instead of FPKM for DEG's analysis, would this approach be fine? Also I want to ask that I am using the topmost differentially expressed genes, is this approach is fine? I do not have much exposure of bioinformatics so I am little afraid of doing the normalization and all those steps again. I have done the RNA-seq through outsourcing and they have provided me FPKM values. I am in a big trouble as my guide is expecting a lot from me and I just got stuck at this point :(

ADD REPLYlink modified 14 months ago • written 14 months ago by BIOTECH.DEEPTI9110
2

No, RPKM is just as bad as FPKM for this purpose. You should tell the outsourcing company that FPKM is not suitable for differential expression comparisons. This is becoming more and more documented, even in publications. Whilst saying this, it is not the end of the World that FPKM was used - you just need to be aware of its limitations. For example, on the FPKM / RPKM scale, a value of 100 in one sample is not the same as 100 in another sample. In the other sample, 100 may be equivalent to 125, etc., which means that statistical tests will make incorrect assumptions.

Yes, the general rule is to take the top differentially expressed genes. You should set your p-value cut-off very low. The fold change issue with FPKM data is also expected as there is no adjustment for low counts when calculating the fold changes (in other programs, this is referred to as 'fold change shrinkage'),

ADD REPLYlink written 14 months ago by Kevin Blighe46k

Ok I should not use FPKM as this will be a bad decision. I will use the DEseq2 for differential gene expression for better analysis. Thank you so much to enlighten me. Your valuable suggestions helped me a lot :)

ADD REPLYlink written 14 months ago by BIOTECH.DEEPTI9110

Okay, but, wait. For DESeq2, you will require the raw counts, i.e., the counts prior to normalisation. Do you have them?

ADD REPLYlink written 14 months ago by Kevin Blighe46k

Yes Kevin I do have the raw files i.e. Fasta files for my data, but I don't know how to analyse them:(

Deepti Mittal Ph.D. Research Scholar C/O Dr. Gautam Kaul Division of Biochemistry NDRI, Karnal.

ADD REPLYlink written 14 months ago by BIOTECH.DEEPTI9110

If it is a preliminary study, then do not worry about it. What are you hoping to do with the results?

ADD REPLYlink written 14 months ago by Kevin Blighe46k

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 14 months ago by WouterDeCoster40k

Yes Kevin, I am just interested to find out the differentially expressed genes and major pathways involved for my treatment condition. Many authors have done the preliminary studies (that too with microarray) on that particular treatment but I am studying it in terms of global cellular outcomes by utilizing the power of RNA-seq. That is why I am majorly focusing on that part only, but the reason I am so much concerned about the major bioinformatics involved as I want to learn each and every part of the transcriptomics.

ADD REPLYlink written 14 months ago by BIOTECH.DEEPTI9110
1

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink written 14 months ago by WouterDeCoster40k

Okay, sounds interesting, and I imagine that the third-party company only provided data on protein coding RNAs? With RNA-seq, one has the ability to do a wide range of analyses:

  • coding and non-coding RNA analyses
  • detection of novel RNAs, including novel splice isoforms
  • detection of circuiar RNAs
  • search for evidence of fusion genes
ADD REPLYlink written 14 months ago by Kevin Blighe46k

Yes Kevin exactly. As I do have the raw files with me I can do wonder with this data. A whole lot of information. I just need to get into its deeper insights and that's the reason I am keen to look at each and every single part of it :)

ADD REPLYlink written 14 months ago by BIOTECH.DEEPTI9110
2
gravatar for Kevin Blighe
14 months ago by
Kevin Blighe46k
Kevin Blighe46k wrote:

You should not be conducting differential expression analysis with FPKM data. Within-sample comparisons are fine - cross-sample comparisons are not fine because FPKM does not adequately adjust for differences in library size across samples.

Use an updated program that does better normalisation of RNA-seq data, like EdgeR or DESeq2.

Kevin

ADD COMMENTlink written 14 months ago by Kevin Blighe46k

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLYlink modified 10 months ago • written 12 months ago by Kevin Blighe46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1129 users visited in the last hour