1
0
Entering edit mode
4.1 years ago
hougiotaejut ▴ 20

Hi,

Assume that you have a count table where there are two dependent conditions with 3 replicates each.

 Gene       R1-C1      R2-C1     R3-C1     R1-C2     R2-C2      R3-C2
X1         43         52        38          120     131       115
X2         250        273       260         26       35       42
X3         112        100       120         205     200       150


To simulate data in a simple way, Is that correct to make an artificial count table for DE analysis by copying the first condition in the second condition like this?

 Gene       R1-C1      R2-C1     R3-C1     R1-C2     R2-C2      R3-C2
X1         43         52        38       43         52        38
X2         250        273       260      250        273       260
X3         112        100       120      112        100       120


So there is no DE gene. And to add some DE genes to the list, multiply some randomly chosen conditions in specified FCs.

Is that correct and acceptable?

On a paper, I read this "To assess how the different software packages and pipelines can control false positive rates, we utilized the multiple replicates within the sample groups by constructing artificial two-group comparisons. No significant detections were expected in such mock comparisons."

I just thought they had copied replicates the way I illustrated above. So that's why I'm asking you.

RNA-Seq simulation artificial • 955 views
0
Entering edit mode

I would probably just shuffle the data for n times.

0
Entering edit mode

I wanted to know if I understand that paper correctly. That's why I asked my question here. Because it seemed so strange to me to just copy and replace the replicates.

0
Entering edit mode

The quote you used is not from the paper you linked, it is from Comparison of software packages for detecting differential expression in RNA-seq studies.

2
Entering edit mode
4.1 years ago
h.mon 34k

No, that was not the approach used in the paper. What they did was to artificially split samples from one treatment into two groups, and then compare these two "groups". So, for example, the mouse RNA-seq data had 10 samples of the C57BL/6J strain, and 11 samples of the DBA/2J strain. They randomly split the 10 C57BL/6J samples into two groups of 5 samples, and tested for differential expression between these groups.