Question

Problem in selecting threshold value for FPKM

1

Entering edit mode

5.9 years ago

BIOTECH.DEEPTI911 ▴ 10

Hey Folks,

I am analyzing the RNA-seq data of cell line. I am comparing two treatment conditions with control and with respect to each other. I found very high Log2 fold change values for differentially expressed genes because of very low values of FPKM in control sample. I have consulted some of papers in which they set the threshold for example, >1 FPKM values for differential expression. How to set this threshold. What is the basic criteria of selecting this threshold value. How the density plots are helpful in this regard. Looking forward for the best possible answers.

Thank You

RNA-Seq • 3.6k views

ADD COMMENT • link 5.9 years ago by BIOTECH.DEEPTI911 ▴ 10

0

Entering edit mode

Hello BIOTECH.DEEPTI911!

Questions similar to yours can already be found at:

How to choose a FPKM cut-off

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLY • link 5.9 years ago by lakhujanivijay 5.8k

0

Entering edit mode

Dear Vijay,

Yes I completely agree that similar questions have been asked by others but the details they have provided is different form my question. I need the explanation for density plots so for this I have mentioned my work plan. It would be great if you will open this coversation.

Regards,

Deepti Mittal Ph.D. Research Scholar C/O Dr. Gautam Kaul Division of Biochemistry NDRI, Karnal.

ADD REPLY • link 5.9 years ago by BIOTECH.DEEPTI911 ▴ 10

0

Entering edit mode

I have opened the question again

ADD REPLY • link 5.9 years ago by lakhujanivijay 5.8k

0

Entering edit mode

Thank you Kevin for your comment. It is very helpful to increase my knowledge, but I want to know that what if I will take RPKM values instead of FPKM for DEG's analysis, would this approach be fine? Also I want to ask that I am using the topmost differentially expressed genes, is this approach is fine? I do not have much exposure of bioinformatics so I am little afraid of doing the normalization and all those steps again. I have done the RNA-seq through outsourcing and they have provided me FPKM values. I am in a big trouble as my guide is expecting a lot from me and I just got stuck at this point :(

ADD REPLY • link 5.9 years ago by BIOTECH.DEEPTI911 ▴ 10

2

Entering edit mode

No, RPKM is just as bad as FPKM for this purpose. You should tell the outsourcing company that FPKM is not suitable for differential expression comparisons.

ADD REPLY • link 3.7 years ago by Kevin Blighe 87k

0

Entering edit mode

Ok I should not use FPKM as this will be a bad decision. I will use the DEseq2 for differential gene expression for better analysis. Thank you so much to enlighten me. Your valuable suggestions helped me a lot :)

ADD REPLY • link 5.9 years ago by BIOTECH.DEEPTI911 ▴ 10

0

Entering edit mode

Okay, but, wait. For DESeq2, you will require the raw counts, i.e., the counts prior to normalisation. Do you have them?

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes Kevin I do have the raw files i.e. Fasta files for my data, but I don't know how to analyse them:(

Deepti Mittal Ph.D. Research Scholar C/O Dr. Gautam Kaul Division of Biochemistry NDRI, Karnal.

ADD REPLY • link 5.9 years ago by BIOTECH.DEEPTI911 ▴ 10

0

Entering edit mode

If it is a preliminary study, then do not worry about it. What are you hoping to do with the results?

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k

0

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Yes Kevin, I am just interested to find out the differentially expressed genes and major pathways involved for my treatment condition. Many authors have done the preliminary studies (that too with microarray) on that particular treatment but I am studying it in terms of global cellular outcomes by utilizing the power of RNA-seq. That is why I am majorly focusing on that part only, but the reason I am so much concerned about the major bioinformatics involved as I want to learn each and every part of the transcriptomics.

ADD REPLY • link 5.9 years ago by BIOTECH.DEEPTI911 ▴ 10

1

Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLY • link 5.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Okay, sounds interesting, and I imagine that the third-party company only provided data on protein coding RNAs? With RNA-seq, one has the ability to do a wide range of analyses:

coding and non-coding RNA analyses
detection of novel RNAs, including novel splice isoforms
detection of circuiar RNAs
search for evidence of fusion genes

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k

0

Entering edit mode

Yes Kevin exactly. As I do have the raw files with me I can do wonder with this data. A whole lot of information. I just need to get into its deeper insights and that's the reason I am keen to look at each and every single part of it :)

ADD REPLY • link 5.9 years ago by BIOTECH.DEEPTI911 ▴ 10

score 2 · Answer 1 · 2018-06-10

2

Entering edit mode

5.9 years ago

Kevin Blighe 87k

You should not be conducting differential expression analysis with FPKM data. Within-sample comparisons are fine - cross-sample comparisons are not fine because FPKM does not adequately adjust for differences in library size across samples.

Use an updated program that does better normalisation of RNA-seq data, like EdgeR or DESeq2.

Kevin

ADD COMMENT • link 5.9 years ago by Kevin Blighe 87k

0

Entering edit mode

An update (6th October 2018):

You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:

Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

The Total Count and RPKM [FPKM] normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units

The first thing one should remember is that without between sample normalization (a topic for a later post), NONE of these units are comparable across experiments. This is a result of RNA-Seq being a relative measurement, not an absolute one.

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k