1 vs Multiple sample expression RNA-Seq analysis
1
3
Entering edit mode
9.3 years ago

Hi,

I have a tricky problem to solve. I have multiple sample, each sample harbor a specific mutation in a gene. And each sample possesses a different mutated gene. The mutation is induced by the same factor in each sample. Thus how can I assess statistically that the genes that are affected by the mutations are differentially expressed. The big issue is that each sample harbor a different mutation, thus might affect a different gene. So I don't have any biological replicate for the same mutated gene. As:

Sample    gene with mutation
SampleA    geneA
SampleB    geneB
SampleC    geneC
SampleD    geneD
SampleE    geneE

I aligned the read using STAR, and count the number of read per gene with htseq-count. I normalized the read count with DESEq. Any advice to test wheter geneA,B,C,D and E are disturbed due to the mutation?

Thanks

RNA-Seq • 4.1k views
ADD COMMENT
0
Entering edit mode

So, is there no control sample - a sample with just the wild type genes?

ADD REPLY
0
Entering edit mode

In fact each samples are tumors (same cell type, different individuals). I have also control samples (healthy sample from the same cell type).

ADD REPLY
0
Entering edit mode

Wouldn't it be possible then to compare to control expression levels? I'm sorry, I am not familiar with tumor gene models.

ADD REPLY
1
Entering edit mode
9.3 years ago

Realistically speaking, there's no reason to treat this any differently than normal differential expression. You either specify a contrast that says compare one sample to the average of the others or just go by the Wald coefficient. If you were to use DESeq2, you'd benefit from the outlier detection, which might eliminate false calls of gene B being DE when testing the gene A sample. Of course if you're just going to look at the single mutated gene in each result set then that won't matter. You could actually get slightly better results by just fitting the data with a different model for each sample (i.e., test gene A versus others, then B versus others, etc. by just specifying "other" as a group). That might be the simplest approach.

ADD COMMENT
0
Entering edit mode

This was also my initial idea. But how to test globally that genes that possess this specific mutation are disturbed. For example, if I have 40 samples, I will test 40 genes for DE and thus have 40 p-values in the end, some signif, some not signif ( I hope that most are signif of course ). Maybe an idea is to do random picking of 40 genes and perform the same tests, and to compare the initial results with the simulated results?

ADD REPLY
0
Entering edit mode

Ah, that's an interesting question. One approach that comes immediately to mind is to rely on the fact that under the null (i.e., genes harboring mutations are not generally DE), the p-values should be uniformly distributed (more or less). One should then be able to compare the p-value distribution that you obtain to a uniform distribution to get a meaningful p-value. The Kolmogorov-Smirnov test should work for this. For 40 samples it's probably a reasonable approach.

ADD REPLY
0
Entering edit mode

It's actually a good idea ! Instead of using the uniform distribution I maybe will use the DESeq2 p-value distribution ( I don't think it's uniformly distributed ). So in R it's something like that:

ks.test(pval.deseq,res.deseq$pvalue,alternative="greater") # one-tailed ks test

where pval.deseq are the 40 pval computed before and res.deseq$pvalue, the pvalue for all the genes.

ADD REPLY
0
Entering edit mode

That answers a slightly different (though perhaps more relevant) question. Keep in mind with the ks.test that it's good to visually ensure that the resulting p-value makes sense, since the test can sometimes give funky results.

ADD REPLY
0
Entering edit mode

of course! I'll let you know when the analysis is done. Thanks for your help

ADD REPLY

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6