How to combine datasets from different sequencing platforms?
Entering edit mode
6 months ago
peter pfand ▴ 110


I have two datasets:

  • A commercial targeted RNASeq (HTG EdgeSeq) with ~ 1.5k coding genes. There is one single probe sequence (75 bp long) per RNA transcript. It contains samples considered as "treated".
    • A public (whole) RNASeq with > 60k genes. It contains samples considered as "control".

I'd like to perform differential expression tests between them, but there are obviously several issues I'd have to deal with. I have already raw counts for each one, so this would be the starting point.

I thought I could do the following:

  1. Subset the "big" dataset, selecting only genes present in the targeted RNASeq.
  2. Use the EDASeq package to correct for length effects with the function withinLaneNormalization, for each dataset independently. I assume this would normalize counts by length, after having set the gene lengths in every dataset differently (the whole RNASeq would consider the real gene lengths, whereas the targeted dataset would consider 75 bp as the gene length).
  3. Create a DGEList object for each dataset from the data generated with EDASeq.
  4. Then cbind these two DGEList objects to combine them into a single one.
  5. Perform differential expression analysis with EdgeR as usual.

I'm a bit suspicious about the rightness of this procedure at several steps. For example, at step 1), wouldn't it change dramatically the distribution of data affecting the library sizes? At step 3) and 4), is it possible to combine DGEList objects with different corrections in gene length through EDASeq? And at step 5), would the results be trustworthy?

I know of course that mixing different sequencing technologies won't yield the best results, but this is the data I possess at the moment.


(NOT A DUPLICATE OF Combining two rnaseq platforms in one)

RNASeq Expression Differential Normalization • 343 views
Entering edit mode

Combining datasets implies you have both control and treatment data from different sources

Entering edit mode
6 months ago

RNASeq is sensitive to batch effect; you can't use controls from a totally different experiment carried out at a different lab at a different time by different people.


Login before adding your answer.

Traffic: 4179 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6