How reproducible is the DEG analysis?
1
1
Entering edit mode
3.6 years ago
asumani ▴ 70

Hi,

Currently I am replicating a few DEG analysis for publicly available datasets. However, my results are not that close to original publications. For example, I got genes related to immune response just like in the original paper. However, genes are not the same. Since I found similar pathways, I thought maybe DEG analysis is not that reproducible? Thoughts, experiences?

DEG Analysis • 931 views
ADD COMMENT
2
Entering edit mode

When you say you are "replicating" the DEG analysis do you mean:

  1. using their data and their methods in an attempt to get the same results as them.

  2. Using their data but your methods

  3. Generating a new datasets (making new RNA libraries etc) that should be a replicate of theirs?

ADD REPLY
0
Entering edit mode

Ah, I forgot it. Thanks! I mean 2.

ADD REPLY
1
Entering edit mode
3.6 years ago

The extent to which two different pipelines produce the same results will depend on how different the two piplines are, but in general I would expect two pipelines run on the same data to more or less the same results.

For example, in a recent study on alignment methods, it was found that the majority of DE genes were the same whether Bowtie/STAR/quasi alignment or selective alignment were used. However each difference will cause more differences in the results. This will be particularly the case if the study is "on the edge" - if it it only finds a small number of genes that are hovering close to the boundary of significance, then this is likely to vary more between pipelines.

ADD COMMENT
0
Entering edit mode

However each difference will cause more differences in the results.

I use different tools for each alignment/assembly/DE steps. Maybe that's why I find only common pathways and not the genes. I would like to hear more about this point if possible.

if the study is "on the edge"

Also, on your last point, I checked out one reference paper, it was stated that there is "substantial differential expression" at more than a thousand genes with FDR <= 0.05. However in the supplementary file, fold change of the genes is not given. I am left to believe that there is at least 1.5 fold change, which I use as threshold in my analysis. So, do you think this can also make the study "on the edge"? This is a highly reputable paper, so I am not sure about how to make an inference on this point.

ADD REPLY
0
Entering edit mode

No, 1000 genes at FDR<0.05 does not sound like its on the edge. Also I wouldn't assume that they've used a 1.5 fold change threshold unless they say so. Do you get a similar number of genes?

You might be intrested in this paper: https://www.nature.com/articles/s41467-017-00050-4

which looks at lots of different combinations of pipeline steps .

ADD REPLY
0
Entering edit mode

I believe the number of genes is not similar. They got "substantially expressed" and significant 1412 genes. I got 1170 (FDR < 0.05). However, with abs(logFC) > 0.58 filtering, I got 515 genes at the end.

ADD REPLY

Login before adding your answer.

Traffic: 1502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6