Question: RNA-Seq time-series analysis among disease samples - analysis strategy advice
0
gravatar for lu.ne
2.1 years ago by
lu.ne70
lu.ne70 wrote:

Hi,

I've been trying to perform some time-series analysis (identification of genes with non-constant expression over time, clustering of genes given their trend over time...) of RNA-Seq data (counts obtained from featureCounts) for a group of 200 patients with arthritis, all sampled at 5 different time points (from diagnosis and up to 2 years after that).

I've come across many packages/tools (I focused on R and Python solutions) but most of them seem focused on differential expression analysis between two conditions or more, which is not what I'm looking for. I was wondering if anyone came across the same problem and what worked to address it?

I tried using R packages, using all genes (>50 000) or subsets, especially EBSeqHMM or maSigPro as they seemed to be able to deal with this but have failed to obtain results (it seems there are too many replicates in the case of EBSeqHMM and I don't get any significant results with maSigPro). I also considered fitting linear models to each one of the genes (something like gene~time+patient_id) and cluster them based on the output models but am unsure if this is a good way to go.

Recommendations would be greatly appreciated.

Thank you

lu.ne

rna-seq time-series • 1.3k views
ADD COMMENTlink modified 2.1 years ago by kristoffer.vittingseerup3.5k • written 2.1 years ago by lu.ne70

Hi,

I have the same issue with EBSeqHMM. Did you find any solution to fix the problem related to the number of replicates?

ADD REPLYlink written 2.0 years ago by Akos20

Hi Akos, I have not found anything I'm afraid (that's probably because they did not intend the tool to be used in that kind of situations though).

ADD REPLYlink written 2.0 years ago by lu.ne70

Hi, Thank you for the quick response. I tried EBSeqHMM with different inputs. It is working with 5 time-points and triplicates per time-pint. It does not work with 5 time points, where first time point has 18 replicates and the others have 30. It was working with 4 time points and 32 replicates per time-point.

ADD REPLYlink written 2.0 years ago by Akos20
2
gravatar for kristoffer.vittingseerup
2.1 years ago by
European Union
kristoffer.vittingseerup3.5k wrote:

I usually, as you suggest, build a linear model (~time + patient_id + batch_factor) for each gene making sure that timepoint 0 (t0) is set as the intercept. Then I would use a F-test (anova-style) thereby extract genes which a significant change in any of the timepoints (vs t0). Such an approach can easily be done with the R packge limma - remember to use voom when you prepare the data. Limma is extremely efficient so running this number of samples is easy and the F-test on many timepoints is described in section 9.6.2 of the vignette. And afterwards I usually, as you suggest, cluster the log2FC vs t0 (typically via PAM clustering) or Mfuzz. Mfuzz can also be used without the DE analysis first.

Hope this helps. Kristoffer

ADD COMMENTlink written 2.1 years ago by kristoffer.vittingseerup3.5k

Thanks a lot for the input, that's helpful.

ADD REPLYlink written 2.1 years ago by lu.ne70
1
gravatar for enxxx23
2.1 years ago by
enxxx23240
European Union
enxxx23240 wrote:

It would be great to have more info here, like for example: - number of replicates, - info about the times points (e.g. before and after the treatment time points?) - type of disease: cancer or non-cancer - tissues of origin for the samples (are all from the same tissue?), - is there healthy tissue and disease tissue samples available - etc.

I would say that the biggest challenge here would be the biological variation (for example, patient 1 is very different to patient 2 and patient 3 even if they have the same disease; patient 1 might be a male with blue eyes and blood type O and patient 2 might be a female with brown eyes and blood type AB) which would drown your signal which you are looking for. So I am not surprised that nothing showed in your results.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by enxxx23240

Sure, sorry if this was not clear, I'll edit accordingly. The replicates are actually the 200 patients, the samples are from whole blood and are collected on patients with arthritis (from when they were 'diagnosed' and then four other times with 6 months between each one of the samples). There are healthy controls available but only a single time point is available for them so I did not use those.

I assumed the lack of results could have been because of the way I was running the analyses but what you say does make perfect sense, I suppose I should have expected that.

Thanks for your input.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by lu.ne70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2469 users visited in the last hour
_