I have interesting discussion. I have developed my own pipeline which analyze NGS seq to functional enrichment. So, i have tested my pipeline performance with publicly available data sets and tried to compare the results from previously reported versus results generated from my pipeline. for fair comparison i have used same reference genome and annotation to compute results. however, Reported results from published pipeline was different including parameters, statistics to find DE gene was different in comparison to that what i used in my pipeline.
therefore, Obtained results were compared between both pipeline at level of Diff expressed genes. Originally study reported 14 at FDR < 0.1 (ranges: 0.03- 0.09 ; Pvalue < 0.05; no logFC reported). In contrast, I found the same as authors had reported in his paper, but ranges of FDR is different (range: 0.007- 0.5; Pvalue <0.05; logFC |1.5| ). Additionally, i found more DE gene (29 genes) generated by my pipeline which has been validated in other work but related to same biological condition such as cancer.
Observing different FDR, I guess, because of statistics (filtration, normalization and model used to compute DE genes) used in both pipeline is different, !!! no ?
It is important to note that authors have not validated all 14 genes, just 5 of them were validated by PCR. On which i found 2 genes (FDR < 0.02; Pvalue <0.05; logFC |2|).
In the same way, i have tested another data and found most of the genes were replicated in my analysis except few of them exclusively present in original studies. (Here, all similar genes found in both pipeline were validated by PCR in original study).
Continuously, testing pipeline with household data gave us many DE genes. Selected those genes were successfully passed through experimental validation.
So my queries are:
1) what fair compassion is possible in between two pipeline's results. ??
2) Its worth to compare DE genes at same cut-off while stats applied in both pipelines were different.
3) It is also important to note that starting from raw data passed from my pipeline, about 50% data sets were discarded (bcz of low quality reads) from the analysis (mapping and then DE analysis). so, does this also can affect in quantification and further in DE analysis ??