Question

Inter-RNAseq dataset operations?

0

Entering edit mode

5.3 years ago

aaragak1 ▴ 40

Hello all,

I have two RNAseq datasets that came from separate experiments/studies, but I'm not entirely sure where I can find the best practices on comparing/merging one with another (or if I even should). In an ideal scenario I would like to do hierarchical clustering and/or GSVA with the two datasets merged - is this a feasible thing to do? And if so, what resources should I turn to for more information about this?

Thank you for your time.

RNA-Seq • 1.5k views

ADD COMMENT • link updated 5.3 years ago by Kristoffer Vitting-Seerup ★ 4.2k • written 5.3 years ago by aaragak1 ▴ 40

0

Entering edit mode

5.3 years ago

Friederike 9.0k

The term you are looking for is "differential gene expression" and "bulk RNA-seq" (I assume that it is bulk, not single-cell).

You can do your own search, but some places to get you started may be this paper by Anders et al., this Bioconductor workflow, this review by Kang & Lau and the hitchhiker's guide to expression analysis

ADD COMMENT • link 5.3 years ago by Friederike 9.0k

0

Entering edit mode

Thank you for your answer.

I don't know if this falls quite into the realm of differential gene expression as I know it, since rather than doing intradataset operations I'm doing interdataset operations. I'm curious about what steps I might need to take in order to make sure that these two datasets - which may have come off of different sequencing platforms - are compatible with one another before I merge them.

ADD REPLY • link 5.3 years ago by aaragak1 ▴ 40

0

Entering edit mode

ah, you should have made that point clearer. Maybe edit your question and precisely describe the types of data set that you have and the question you are trying to address. If the technical batch is completely confounded with your biological source of variation of interest, I'm not sure how useful any analysis can be.

ADD REPLY • link 5.3 years ago by Friederike 9.0k

score 4 · Accepted Answer · 2020-04-08

Since the datasets are from different sources there will be a batch effect between them! This unfortunatly means you cannot directly compare the datasets as you do not know what changes are due to the condition changes and which are due to the batch effect.

If there are samples witch should be identical/similar (e.g. the same untreated cell line used for control in both datasets) you can use that to estimate the batch effect by incorporating the batch effect into e.g. a differential expression model.

If there are no overlap (e.g. one study is normal healthy samples and the other study contains the diseased states) the only way to analyse the data is to do an intra-study analysis and compare them afterwards. Such analysis will always have to be based on rank. Examples could be: - Rank genes based on average expression and look for large rank changes. - Use fGSEA on the expression in each dataset and compare the gene-set ranks.

Hope this gets you started

Cheers Kristoffer