I have analysed the differential gene expression in Patient versus normal conditions using Cuffdiff and EdgeR. I want to know why there is big difference in the number of genes that are differentially expressed between Cuffdiff and EdgeR. Here the details of the analysis: Our aim is to
- to know the differential expression in the total RNA (RZ) abundace of the genes
- to know the differential expression in the poly(A) RNA (PA) abundace of the genes
Samples: Patient (3 replicates), Normal1 (2 replicates), Normal2 (2 replicates) sequenced using Illumina Hiseq 2000 platform. Sequencing was done in two ways (total 7 samples per each way):
- using totalRNA (to know the differential expression in the total RNA abundace of the genes)
- using poly(A) selected RNA (to know the differential expression in the poly(A) RNA abundace of the genes)
Analysis: I got approx. 26 million reads( paired end).
- I started the analysis by testing the QC of the reads and then mapped the reads to the human reference genome (GRCh.p11, ensembl) using TopHat(2.0.8b).
- I used the bam file from each replicate to analyse the differential expression using cuffdiff (cufflink 2.1.1) by taking the Normal1(2 replicates), Normal2 (2 replicates) as 4
replicates of normal vs 3 replicates of patient.
- The resulting cuffdiff output file has 433 genes in RZ and 485 genes in PA that are significantly differentially expressed in normal vs patient P-value < 0.05(q- value) Then I wanted to evaluate this result by using HTseq-EdgeR tools. For this,
- I used the same bam files for HTseq and tested to know the differentially expressed genes in normal vs patient.
- The EdgeR results has 1169 genes in RZ and 938 gene in PA that are significantly differentially expressed in normal vs patient P-value < 0.05(FDR) Comparing these two results, 329 genes in RZ and 374 genes in PA were common.
Could any one clarify me why these two tools behaving differently. Which results I have to consider for my further studies.