Entering edit mode

4 weeks ago

aUser
▴
40

Hi everyone,

Can we treat outliers as batch variables in linear modeling, e,g in DESeq2? I know the batches are different, however, can I think that the "samples in outliers are differently processed" thus qualifying to be a different batch? I do not want to remove the outliers (based on PCA, PC1 > 200; actual value is around 600 along PC1. There are ~20 samples). I want to include them for DEG calculation. I was looking for the resources where this has been discussed, but mya be I missed.

Thank you for your input/comment.

Can you provide more context, and also show your PCA? Were there replicates of each sample?

I'm not sure I follow your logic. Outliers in linear models are individual samples that deviate from expected distributions. If the outliers were processed in a different manner or came from the same day of sampling, for example, then there could be a technical batch effect.

Thank you for your response, and sorry for being late as we had vacations here.

I am working with TCGA-LUAD data set, and the samples are processed/normalized using DESeq2. The steps are given below:

For PCA:

The samples >200 along PC1 are considered as outliers (as suggested by literature).

The PCA figure is attached. NT are normals, while TP are tumor samples.