Question: RNA-Seq time series analysis using ImpulseDE2 software
1
gravatar for stu111538
5 months ago by
stu11153860
stu11153860 wrote:

Hi,

I want to use ImpulseDE2 software to identify genes that are differentially expressed over time (I have samples from 12 patients at 4 time points). I have two questions and unfortunately trying to contact the developers failed.

1) Does somebody know whether ImpulseDE2 can deal with missing data? For some patients one timepoint is missing. There is no error message when I run the software, but I am concerned whether ImpulseDE2 can deal with this. For me it is not obvious from the published benchmark article (https://www.ncbi.nlm.nih.gov/pubmed/30102402).

2) I do not understand the ImpulseDE2 result table. If I perform the analysis with boolIdentifyTransients = TRUE, I get some genes where is.transient=TRUE and some more genes where is.monotonous=TRUE, but some genes with padj <<0.05 have FALSE in both fields. I thought the algorithm tests tree possibilities: monotonous, transient and constant. However, according to the result table some genes are none of those three possibilities. How should I interpret those genes? Are they false positives with small p-values, but without successful fit of a transient or monotonous course?

I appreciate every tip. Thanks in advance!

ADD COMMENTlink written 5 months ago by stu11153860

Yes it seems that impulsede2 can handle missing data. According to its manual: "matCountData in runImpulsede2(): includes read count data, unobserved entries are NA."

But I have a question for you. How have you normalized your data? As I know, this model does not implement any normalization method.

ADD REPLYlink modified 5 months ago • written 5 months ago by statfa450

Ok, actually this is good to know. However, what I meant with missing data was that data for a whole time point for one or more patients is missing. I think the manual extract means missing data in terms of count data for particular genes are missing.

To your question: As I understood it from the very recent article from Fischer et al. (https://www.ncbi.nlm.nih.gov/pubmed/30102402) ImpulseDE2 includes a normalization:

"Reference methods We used ImpulseDE2, DESeq2 and DESeq2splines on rounded expected count matrices (Supplementary Notes Section S5). We used DESeq2 in the log-likelihood ra- tio test mode in all cases. We used ImpulseDE, edge and limma on scaled data, where the scaling factor is deter- mined as the DESeq2 size factor (2). Therefore, the same library size normalization was used for ImpulseDE2, DE- Seq2, ImpulseDE, edge and limma...."

What do you think? In case you are uncertain about normalization I would recommend to do the normalization using DESeq2 like this:

    # in my case I import HTSeq count data
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design = ~ 1) 
ddsHTSeq <- ddsHTSeq[ rowSums(counts(ddsHTSeq)) > 1, ]
ddsHTSeq <- estimateSizeFactors(ddsHTSeq) # this is the normalization that is also meant in the article
counts.sf_normalized <- counts(ddsHTSeq, normalized=TRUE) # make count matrix for ImpulseDE2 from this
ADD REPLYlink written 5 months ago by stu11153860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1391 users visited in the last hour