Question: Advice on running a differential expression analysis on patients with slightly different mutations within each group
0
gravatar for unawaz
11 months ago by
unawaz50
Australia
unawaz50 wrote:

Hi all,

I'm trying to do an differential expression analysis on RNA-seq data. I have data from controls, patients with a mutation in gene A, and patients with a mutation in gene B. Each of the patient groups have slightly different mutations in their respective genes and this has been known to cause a slightly different phenotype.

My overall aim to see how genes are differentially expression between each group and within the mutation groups, given that each point mutation is different. Since the data is real patient data and for obvious reasons I am unable to obtain biological replicates for each specific type of mutation, I've grouped the data by controls, patients with mutations in gene A, and patients mutations in gene B. I've accounted for the conditions and gender in my designs.

However MDS plots and PCA both show that the mutation groups aren't clustering (which is to be expected given the mutations in the genes are slightly different). I wanted to run glmLRT () from edgeR to perform a DE analysis between Control vs Gene A group, Control vs Gene B group and Gene B vs Gene A but I'm not sure if this is the best way to find what I'm looking for.

I would really like some advice on what differential expression pipeline would be the best for what I'm trying to do, or if glmLRT() from edgeR would suffice? I've been looking through previous posts on Biostars as well and haven't found anything. If I have missed anything, please do share the link!

TL;DR I have 3 groups: controls (7 biological replicates), group with differing mutations in gene A (3 samples, each with different mutation), group with differing mutations in gene B (4 samples, each with different mutation) and no biological replicates for the mutations. What is the best design and pipeline to perform DE analysis given that each mutation is different?

ADD COMMENTlink modified 11 months ago • written 11 months ago by unawaz50
2

One important thing you can do in your analysis is to try to estimate the number and identity of co-variates in your samples. The aim would be to control for other variables that influence the gene expression, beside the specific mutation you're interested in (for example, as you mentioned, mutation in other genes). You can do this with the R package sva.

Another important point is that this kind of analysis works well if you have high numbers of patients, say 50, while it's typically extremely underpowered if you have a handful.

ADD REPLYlink modified 11 months ago • written 11 months ago by Martombo2.5k

Thank you! Will give sva a shot!

Unfortunately we only have a handful of patient data (3 for mutations in gene A and 4 for mutations in gene B)

ADD REPLYlink written 11 months ago by unawaz50
1

You can check this paper : Combining gene mutation with gene expression data improves outcome prediction in myelodysplastic syndromes, Nature Communications, 2015 : https://www.nature.com/articles/ncomms6901

ADD REPLYlink modified 11 months ago • written 11 months ago by Nicolas Rosewick8.1k
0
gravatar for unawaz
11 months ago by
unawaz50
Australia
unawaz50 wrote:

In case anyone else has a similar issue, I will be doing something called an Outlier Detection. Basically, I will be using Z-scores and will look for genes in patients that do not look like other patients and controls.

More info: https://bioinformatics.stackexchange.com/questions/2180/rnaseq-z-score-intensity-and-resources

ADD COMMENTlink written 11 months ago by unawaz50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour