Question: Gene-Level Analysis Of Rna-Seq Matched Pairs Of Samples?
gravatar for Ryan Thompson
8.0 years ago by
Ryan Thompson3.4k
TSRI, La Jolla, CA
Ryan Thompson3.4k wrote:

I am analyzing some RNA-seq data in which we have pairs of samples from the same individuals, before and after treatment. For example, we might have 4 samples:

  • Individual 1, bfore treatment
  • Individual 1, after treatment
  • Individual 2, before treatment
  • Individual 2, after treatment

Unfortunately, as far as I can tell most standard RNA-seq tools will treat my data as simply a set of 2 pre-treatment samples and another set of 2 post-treatment samples, with no regard for the fact that they are matched pairs of samples from the same individuals. That is, the statistical test being performed is essentially testing for differences between two (or more) groups of unlabeled samples. In contrast, I want to test for consistent changes in response to treatment across individuals. Is there an analysis program or package for RNA-Seq data that supports matched pairs of samples like this?

Note that for now I am not interested in alternative splicing, but rather just testing at the gene level for differential expression.

As an example of the limitation I am looking to overcome, consider this quote from the conclusion of the baySeq paper which confirms what I have said above:

... at present these methods remain limited to comparisons involving multiple groups, and are not able to account for, for example, paired samples.

It seems that at least DESeq, edgeR, and cuffdiff share the same limitation.

rna-seq • 8.2k views
ADD COMMENTlink modified 3.7 years ago by Biostar ♦♦ 20 • written 8.0 years ago by Ryan Thompson3.4k

So far I have simply performed DESeq in 1v1 mode for each individual, then collected the differential gene sets and looked for common entries in a post-process step. It's not statistically rigorous, and I'm going to look into matted's suggestion of edgeR. He said edgeR's documentation explains paired sample design!

ADD REPLYlink written 8.0 years ago by kstamm50
gravatar for matted
8.0 years ago by
Boston, United States
matted7.3k wrote:

Several packages allow more sophisticated analyses along the lines you describe, where you can give it specific design matrices that account for multiple overlapping treatments and different samples. I am most familiar with using edgeR for this task. See the edgeR user's guide, specifically section 2.6, "More complex experiments (glm functionality)."

The "RNA-Seq of oral carcinomas vs matched normal tissue" and "RNA-Seq of pathogen inoculated arabidopsis with batch e ffects" examples in the guide seem closest to the experiment you describe. With this framework, you should be able to calculate treatment-specific contrasts while allowing for individual-specific variation.

ADD COMMENTlink written 8.0 years ago by matted7.3k
gravatar for raphael.poujol
6.5 years ago by
raphael.poujol30 wrote:

What do you think about ?? I am using it but I have to read the maths ...

ADD COMMENTlink written 6.5 years ago by raphael.poujol30
gravatar for swbarnes2
6.5 years ago by
United States
swbarnes28.2k wrote:

Doesn't the usefulness of the paired info depend on you knowing there is low biological variance in the expression of the genes you are looking at?

It does little good to say "In sample 1, expression tripled between treatment and control, while in sample 2 it only doubled" unless you know that those fold changes are well outside the range of natural variation, right?

I just worry that applying sophisticated algorithms to underpowered data is going to be a wild-goose chase.

ADD COMMENTlink written 6.5 years ago by swbarnes28.2k
gravatar for Charles Warden
6.5 years ago by
Charles Warden7.8k
Duarte, CA
Charles Warden7.8k wrote:

cuffdiff has the limitation that you mention. DESeq and edgeR do not. I would personally use a 2-way ANOVA using log2(RPKM + 0.1) value.

If you are curious about how the options would compare (at least in larger patient cohort), I ran some benchmarks with paired tumor vs. normal data.

The short answer is that I think DESeq is the best out of the 3 options that you listed

ADD COMMENTlink written 6.5 years ago by Charles Warden7.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour