Question

Comparison and cross-sample normalization for RNAseq from different experiments

0

Entering edit mode

5.6 years ago

urjaswita ▴ 100

Hi All,

I want to analyze RNA-seq datasets generated from different experiments, and compare the expression of selected genes across those samples. One of the experiment has 2 conditions (WT1, Drug1), and the other dataset has 4 samples (WT2, Drug2, Drug3, Drug4). The WT1 and WT2 samples are similar but not identical. What I want to do is to compare the expression of selected gene sets in Drug1 and Drug2 samples.

I used kallisto to quantify all 6 samples together, and used either DEseq2 or sleuth for cross-sample normalization. I could extract the cross-sample normalized TPM values from the sleuth or DEseq2. And now I want to look at the normlaized TPM in Drug1 and Drug2 sample to check if one of them is higher or lower. But I have a few questions:

Does this approach sound reasonable, or are there better ways to do it?
Because the datasets are from different labs/experiments, how do I know that the cross-sample normalization worked?
I tried plotting boxplot of the TPMs extracted from DEseq2 and some of the medians are slightly different. Does that mean I need to do a different normalization approach?

Any suggestions and help is greatly appreciated.

Thank you! Urja

RNA-Seq DEseq2 Kalliso Sleuth Normalization • 2.5k views

ADD COMMENT • link updated 5.6 years ago by Devon Ryan 104k • written 5.6 years ago by urjaswita ▴ 100

score 2 · Answer 1 · 2018-09-05

2

Entering edit mode

5.6 years ago

Devon Ryan 104k

It makes the most sense to add experiment1 and experiment2 to your design so you can at least partially model the batch effect (WT1 and WT2 would then be relabeled WT). Then you can directly compare Drug1 and Drug2. You may additionally have a look at the SVA package to try and get somewhat better control over the batch effect.
See above
You don't actually care about TPMs, you just care "Does drug1 have a bigger/smaller effect than drug2 on genes X/Y/Z?".

ADD COMMENT • link 5.6 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks a lot Devon. A few clarifications if you could please:

When you say, "add experiment1 and experiment2 to your design", do you mean that I should just relabel WT1 AND WT2 samples to WT, so that WT is shared across the two experiments (with 6 replicates now)? Or something more complicated like covariates etc?

And about the cross-sample normalization: I thought for a proper normalization, the medians should be well aligned if I plot TPM boxplot. But in my case the medians of Drug1 samples are slightly higher than the rest. So I am worried that if overall Drug1 has higher TPM, the differences could be just because normalization did not do a god job. Or maybe it doesn't matter?

Thanks again for your help. Urja

ADD REPLY • link 5.6 years ago by urjaswita ▴ 100

0

Entering edit mode

do you mean that I should just relabel WT1 AND WT2 samples to WT, so that WT is shared across the two experiments

Yes, exactly. You can then add an experiment variable to the design with values 1 and 2.

It's not so much the TPMs, but the medians that should be quite similar across samples. How different are the values you're seeing? Can you post a plot?

ADD REPLY • link 5.6 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Davon. Please see below the plot. This is normalized TPM that I extracted from Sleuth (which uses DEseq2 cross sample normalization I think). WT and Drug1 have a bit higher medians compared to rest of the data.

How much differences in the median is okay to go ahead with the DE analysis?

Normalized TPM boxplot

ADD REPLY • link 5.6 years ago by urjaswita ▴ 100

0

Entering edit mode

I have to admit to not being overly familiar with sleuth. What happens if you use DESeq2 with tximport? Do you have similar issues?

ADD REPLY • link 5.6 years ago by Devon Ryan 104k

0

Entering edit mode

I tried and got very similar results with that too. Please let me know what are your thoughts in that case. Thank you.

ADD REPLY • link 5.6 years ago by urjaswita ▴ 100

0

Entering edit mode

I suspect it's fine then and due to plotting TPMs rather than counts. I expect there's some isoform switching going on due to drugs 2-4.

ADD REPLY • link 5.6 years ago by Devon Ryan 104k