Question: Determine Condition using LogFC values
3.3 years ago by
United States
Hi Everyone,

I am analyzing RNA-Seq data having 22 samples from 3 batches for differential expression of genes. My condition is test between defective and normal phenotype. For one of the samples, the condition is indeterminate. From the slides, it appears to be slightly defective but might not necessarily be so. Is there anyway I can determine what to label it as? Would it help to see changes in logFC values when i first label it as defective and then as normal?


If you're unsure what the sample is, it is best to exclude it (not only for your interests but for others). On the other hand, if you want to look at how similar the replicates are then you can use a simple Pearson correlation values/plots to make the decision.


3.3 years ago by
WCIP | Glasgow | UK
Principal components analysis (PCA) is sometimes applied to expression levels from RNA-Seq data to spot outliers or otherwise unexpected sample behaviours. You could apply it to your case and see first, if samples cluster neatly by condition. Then see which cluster your undetermined sample best belongs to. Having said that, I would be careful with labelling this sample as one or the other group just on the bases of PCA. 

If you are using edgeR look at function plotMDS, I think DESeq has some similar function.

PCs on all genes will classify the unknown sample correctly, but I bet you $100 the anomalous sample lies somewhere between the two and removed in the third PC.
