Question: Simulating Time Course RNA-Seq Data With Polyester
gravatar for ecaa2017
11 months ago by
ecaa20170 wrote:

Hi everyone!

I'm new here, but thought I'd ask for some advice. I'm a college student working on a bioinformatics project doing time course (TC) differential expression (DE) analysis of an RNA-seq dataset. By simulating TC RNA-seq data (and controlling the # of DEGs, direction of DE, etc), I hope to benchmark combinations of normalization methods and TC DE tools for later application to our actual dataset.

I'm using Polyester to simulate the read count data. However, given the unique design of our experiment (10 TPs, 3 biological replicates per TP), I can't use the standard simulate_experiment() function. Instead I used the simulate_experiment_countmat() function which gives greater flexibility, and where I can specify DE at individual time points.

Here is where I run into trouble: the resulting count matrix poorly reflects real RNA-seq data, as nonDE genes have constant/unchanging read counts over time. I want to add noise to the read count data but am unsure how much noise to add. Should I obtain gene-wise dispersion estimates (using DESeq2) and then calculate the variance using the variance of a NB distribution formula? Or is there a better way to go about this?

I would greatly appreciate any feedback from those with experience in Polyester or TC RNA-Seq analysis!

Many thanks, Ethan

ADD COMMENTlink written 11 months ago by ecaa20170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1279 users visited in the last hour