Question

dealing with Cuffdiff2 outputs: adding pseudocounts or using cutoff values?

0

Entering edit mode

8.9 years ago

a_j_d_a_n • 0

Dear experts,

although (or exactly because of that) I read a lot of manuals and papers dealing with RNA-Seq for the analysis of differential expressed genes, I´m unfortunately don´t really sure about the pipeline/parameter I should use. So, I really hope that some of you can give me any advice.

My experimental setup in short: 2 WT samples and 2 patient samples (same family, same unknown mutation), 150 bp paired-end. I´m searching for genes that are differentially expressed. Especially, I´m searching for a gene which is not expressed in the patients, but in the WT (perhaps because of a deletion of an exon or suchlike). I merged the WT and patient samples and analyze the files with the help of TopHat2 and Cuffdiff2. As I learned in some forum discussions, I can´t use any statistics for filtering, because I don´t have biological replicates. So I´ve just calculated the TPMs and ranked them regarding their log2fold changes. But I have different problems: many of the genes have in one condition FPKM/TPM = 0 and I don´t know, how to proceed with them. Should I add a pseudocount? Or should I eliminate them? But regarding my questioning I ´don´t want to cut them off actually. Can anyone help me with this topic?

Thanks a lot in advance!!

RNA-Seq • 2.3k views

ADD COMMENT • link updated 7.1 years ago by Biostar 20 • written 8.9 years ago by a_j_d_a_n • 0

score 0 · Answer 1 · 2016-08-31

In my opinion if FPKM=0 it means that no reads were aligned to that gene. Because it is RNAseq I would suggest that it happens when there is no expression of that gene. This information is valuable. If you see the FPKM for the same gene in other sample you can say that the expression occur in sample 1 and not in sample 2. The log2foldchange for FPKM=0 is inf/-inf. If you have cuffdiff output you can see that some of genes have different FPKM (which means that one may be overexpressed), some have FPKM = 0. In addition there is an information about the probability and statistic test score. There is NOTEST - if both samples have coverage for gene 0, and OK if even one gene has FPKM different than 0.

Hope it helps,

Best, Agata

score 0 · Answer 2 · 2016-08-31

0

Entering edit mode

8.9 years ago

a_j_d_a_n • 0

Thanks for your answer, Agata. The genes which have FPKM=0 in both samples, I´ll eliminate anyway, because they don´t tell anything about differential expression. But the log2fc=+/-inf are nevertheless a problem, because it is not really necessary to show these values within a heatmap. For this reason, I thought about adding pseudocounts?

ADD COMMENT • link 8.9 years ago by a_j_d_a_n • 0

0

Entering edit mode

Did you try to make a heatmap by cummeRbund library in R? This library is prepared for reading the diff_out files from cuffdiff program.

Here is a link to tutorial : http://compbio.mit.edu/cummeRbund/manual_2_0.html Best, Agata

ADD REPLY • link 8.9 years ago by agata88 ▴ 870

0

Entering edit mode

I don't think I understand correctly ... how showing the differential expression (log2fc=+/-inf ) may be not necessary in heatmap? What would you like to have on this heatmap?

ADD REPLY • link 8.9 years ago by agata88 ▴ 870

score 0 · Answer 3 · 2016-08-31

0

Entering edit mode

8.9 years ago

a_j_d_a_n • 0

Oh sorry, a little typing error: necessary = possible. But I think, I try to make the heatmap with the ranked TPM/FPKM. So the log2fc=+/- inf will not be a problem

ADD COMMENT • link 8.9 years ago by a_j_d_a_n • 0