Question: Inter-RNAseq dataset operations?
gravatar for aaragak1
6 weeks ago by
aaragak10 wrote:

Hello all,

I have two RNAseq datasets that came from separate experiments/studies, but I'm not entirely sure where I can find the best practices on comparing/merging one with another (or if I even should). In an ideal scenario I would like to do hierarchical clustering and/or GSVA with the two datasets merged - is this a feasible thing to do? And if so, what resources should I turn to for more information about this?

Thank you for your time.

rna-seq • 107 views
ADD COMMENTlink modified 6 weeks ago by kristoffer.vittingseerup3.2k • written 6 weeks ago by aaragak10
gravatar for kristoffer.vittingseerup
6 weeks ago by
European Union
kristoffer.vittingseerup3.2k wrote:

Since the datasets are from different sources there will be a batch effect between them! This unfortunatly means you cannot directly compare the datasets as you do not know what changes are due to the condition changes and which are due to the batch effect.

If there are samples witch should be identical/similar (e.g. the same untreated cell line used for control in both datasets) you can use that to estimate the batch effect by incorporating the batch effect into e.g. a differential expression model.

If there are no overlap (e.g. one study is normal healthy samples and the other study contains the diseased states) the only way to analyse the data is to do an intra-study analysis and compare them afterwards. Such analysis will always have to be based on rank. Examples could be: - Rank genes based on average expression and look for large rank changes. - Use fGSEA on the expression in each dataset and compare the gene-set ranks.

Hope this gets you started

Cheers Kristoffer

ADD COMMENTlink written 6 weeks ago by kristoffer.vittingseerup3.2k

Beautiful! Thank you very much!

ADD REPLYlink written 6 weeks ago by aaragak10
gravatar for Friederike
6 weeks ago by
United States
Friederike5.6k wrote:

The term you are looking for is "differential gene expression" and "bulk RNA-seq" (I assume that it is bulk, not single-cell).

You can do your own search, but some places to get you started may be this paper by Anders et al., this Bioconductor workflow, this review by Kang & Lau and the hitchhiker's guide to expression analysis

ADD COMMENTlink written 6 weeks ago by Friederike5.6k

Thank you for your answer.

I don't know if this falls quite into the realm of differential gene expression as I know it, since rather than doing intradataset operations I'm doing interdataset operations. I'm curious about what steps I might need to take in order to make sure that these two datasets - which may have come off of different sequencing platforms - are compatible with one another before I merge them.

ADD REPLYlink written 6 weeks ago by aaragak10

ah, you should have made that point clearer. Maybe edit your question and precisely describe the types of data set that you have and the question you are trying to address. If the technical batch is completely confounded with your biological source of variation of interest, I'm not sure how useful any analysis can be.

ADD REPLYlink written 6 weeks ago by Friederike5.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1454 users visited in the last hour