Question: what is fair comparison between results generated from one again other pipeline ??
gravatar for unique379
3.3 years ago by
unique37970 wrote:

Dear all,

I have interesting discussion. I have developed my own pipeline which analyze NGS  seq to functional enrichment. So, i have tested my pipeline performance with publicly available data sets and tried to compare the results from previously reported versus results generated from my pipeline. for fair comparison i have used same reference genome and annotation to compute results. however,  Reported results from published pipeline was different including parameters, statistics to find DE gene was different in comparison to that what i used in my pipeline. 

first observation

therefore, Obtained results were compared between both pipeline at level of Diff expressed genes. Originally study reported 14 at FDR < 0.1 (ranges: 0.03- 0.09 ; Pvalue < 0.05; no logFC reported). In contrast, I found the same as authors had reported in his paper, but ranges of FDR is different (range: 0.007- 0.5; Pvalue <0.05;  logFC |1.5| ). Additionally, i found more DE gene (29 genes) generated by my pipeline which has been validated in other work but related to same biological condition such as cancer. 

Observing different FDR, I guess, because of statistics (filtration, normalization and model used to compute DE genes) used in both pipeline is different, !!! no ?

​It is important to note that authors have not validated all 14 genes, just 5 of them were validated by PCR. On which i found 2 genes (FDR < 0.02; Pvalue <0.05;  logFC |2|). 

Second observation

In the same way, i have tested another data and found most of the genes were replicated in my analysis except few of them exclusively present in original studies. (Here, all similar genes found in both pipeline were validated by PCR in original study).

Third observation

Continuously, testing pipeline with household data gave us many DE genes. Selected those genes were successfully passed through experimental validation. 

So my queries are:

1)  what fair compassion is possible in between two pipeline's results. ??

2) Its worth to compare DE genes at same cut-off while stats applied in both pipelines were different.

3) It is also important to note that starting from raw data passed from my pipeline, about 50% data sets were discarded (bcz of low quality reads) from the analysis (mapping and then DE analysis). so, does this also can affect in quantification and further in DE analysis ??


rna-seq next-gen R gene • 1.2k views
ADD COMMENTlink written 3.3 years ago by unique37970

A "fair" comparison is to check which pipeline returns the best result, in terms of externally validated, so likely correct identified, DE genes. You want to use either the "recommended settings" for both pipelines (and it doesn't matter if the they are different between pipelines), or you may want to test a range of reasonable parameters for both and report on the best results.

ADD REPLYlink written 3.2 years ago by h.mon23k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1290 users visited in the last hour