Question

Advice on running a differential expression analysis on patients with slightly different mutations within each group

0

Entering edit mode

6.8 years ago

unawaz ▴ 60

Hi all,

I'm trying to do an differential expression analysis on RNA-seq data. I have data from controls, patients with a mutation in gene A, and patients with a mutation in gene B. Each of the patient groups have slightly different mutations in their respective genes and this has been known to cause a slightly different phenotype.

My overall aim to see how genes are differentially expression between each group and within the mutation groups, given that each point mutation is different. Since the data is real patient data and for obvious reasons I am unable to obtain biological replicates for each specific type of mutation, I've grouped the data by controls, patients with mutations in gene A, and patients mutations in gene B. I've accounted for the conditions and gender in my designs.

However MDS plots and PCA both show that the mutation groups aren't clustering (which is to be expected given the mutations in the genes are slightly different). I wanted to run glmLRT () from edgeR to perform a DE analysis between Control vs Gene A group, Control vs Gene B group and Gene B vs Gene A but I'm not sure if this is the best way to find what I'm looking for.

I would really like some advice on what differential expression pipeline would be the best for what I'm trying to do, or if glmLRT() from edgeR would suffice? I've been looking through previous posts on Biostars as well and haven't found anything. If I have missed anything, please do share the link!

TL;DR I have 3 groups: controls (7 biological replicates), group with differing mutations in gene A (3 samples, each with different mutation), group with differing mutations in gene B (4 samples, each with different mutation) and no biological replicates for the mutations. What is the best design and pipeline to perform DE analysis given that each mutation is different?

RNA-Seq differential expression edgeR deseq2 • 2.5k views

ADD COMMENT • link 6.8 years ago by unawaz ▴ 60

2

Entering edit mode

One important thing you can do in your analysis is to try to estimate the number and identity of co-variates in your samples. The aim would be to control for other variables that influence the gene expression, beside the specific mutation you're interested in (for example, as you mentioned, mutation in other genes). You can do this with the R package sva.

Another important point is that this kind of analysis works well if you have high numbers of patients, say 50, while it's typically extremely underpowered if you have a handful.

ADD REPLY • link 6.8 years ago by Martombo ★ 3.2k

0

Entering edit mode

Thank you! Will give sva a shot!

Unfortunately we only have a handful of patient data (3 for mutations in gene A and 4 for mutations in gene B)

ADD REPLY • link 6.8 years ago by unawaz ▴ 60

1

Entering edit mode

You can check this paper : Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes, Nature Communications, 2015 : https://www.nature.com/articles/ncomms6901

ADD REPLY • link 6.8 years ago by Nicolas Rosewick 11k

score 0 · Answer 1 · 2018-10-05

In case anyone else has a similar issue, I will be doing something called an Outlier Detection. Basically, I will be using Z-scores and will look for genes in patients that do not look like other patients and controls.

More info: https://bioinformatics.stackexchange.com/questions/2180/rnaseq-z-score-intensity-and-resources