Question: How to quantitative identification outlier from technology replication and biological replication
gravatar for Grace_G
11 months ago by
Grace_G20 wrote:


I get two count matrix (tissue A, tissue B) of genes to identity DEGs, the their includes samples technology replication and biological replication.

To get DEGs:
However, if here no technology replications of each sample, I can directly to do compare. So I'm going to get average of technology replications to represent each sample, it's right?

PCA Visualization:
For tissue A, (same for tissue B)
I use tissue A matrix to draw PCA, for each sample, their technology replications shows outliers, but how to quantitative identification them, is their some ways, so remove the outliers to calculation average?

Any views will be much appreciated!

rna-seq next-gen R • 285 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by Grace_G20

Averaging read counts is never a good idea, at least not if you want to use something like limma, edgeR or DEseq2 for DEG analysis. And to mix technical and biological replicates in a linear model using these tools (limma, etc.) is to my knowledge very difficult (unless you have a PhD in statistics). My advice is to read the manual of limma (etc.) very carefully, and follow the examples they give in there. Also read carefully how they handle biological and technical reps. I know for instance that with limma, you can include technical replicates with duplicateCorrelation(), it is shown in chapter 18 (Yoruba HapMap case study).

ADD REPLYlink modified 11 months ago • written 11 months ago by Benn7.9k

Thanks a lot! Very helpful and practical idea, I will read these part carefully. And I'm not sure never a good idea mainly means calculate average is will produce decimal point, so can't as input? Actually, why not after process outlier then directly use t-test? since there are many requirements to use tools like Deseq2, but my data here is not easy to meet, and it's what I'm going to try. Looking forward to your comment:)

ADD REPLYlink written 11 months ago by Grace_G20

It has been shown that these tools (limma, edgeR, etc.) perform much better than a t-test. Please talk with statisticians, or ask the developers of these tools themselves on bioconductor. But please be aware, that before asking a question there (and here also), you are expected to do some research yourself first, like searching google about your topic (I know for example that the question about why limma is better then a t-test has been asked many times before), or try reading the manuals of these tools thoroughly. So go research a bit more on that is my advice.

ADD REPLYlink modified 11 months ago • written 11 months ago by Benn7.9k

I see, thanks for sharing these useful ways for thinking and studying, best wishes!

ADD REPLYlink written 11 months ago by Grace_G20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1003 users visited in the last hour