Can we treat outliers as batch variables in linear modeling?
0
0
Entering edit mode
4 weeks ago
aUser ▴ 40

Hi everyone,

Can we treat outliers as batch variables in linear modeling, e,g in DESeq2? I know the batches are different, however, can I think that the "samples in outliers are differently processed" thus qualifying to be a different batch? I do not want to remove the outliers (based on PCA, PC1 > 200; actual value is around 600 along PC1. There are ~20 samples). I want to include them for DEG calculation. I was looking for the resources where this has been discussed, but mya be I missed.

Thank you for your input/comment.

R modeling Outlier linear • 325 views
ADD COMMENT
0
Entering edit mode

Can you provide more context, and also show your PCA? Were there replicates of each sample?

I'm not sure I follow your logic. Outliers in linear models are individual samples that deviate from expected distributions. If the outliers were processed in a different manner or came from the same day of sampling, for example, then there could be a technical batch effect.

ADD REPLY
0
Entering edit mode

Thank you for your response, and sorry for being late as we had vacations here.

I am working with TCGA-LUAD data set, and the samples are processed/normalized using DESeq2. The steps are given below:

ld_dds <- DESeqDataSetFromMatrix(countData = ld_dataPrep2,
                              colData = ld_sampleTypes,
                              design = ~sType)
ld_dds <- DESeq(ld_dds)

# extract normalized count for Clustering and Immune infiltration analysis
ld_normCount <- counts(ld_dds, normalized = TRUE)

For PCA:

ld_normCount <- ld_normCount[rowSums(ld_normCount) > 100 , ]
pca.obj = prcomp(t(ld_normCount),
                 scale. = TRUE)

pcr.objx <- as.data.frame(pca.obj$x)

dtp <- data.frame('titles' = rownames(pcr.objx),
                  pcr.objx[, c(1:3)]) # the first three components are selected

dtp2 = merge(dtp, ld_sampleTypes, by.x = "titles", by.y = 0)

#print(head(dtp2))
ggplot(data = dtp2) +
    geom_point(aes(x = PC1, y = PC2, col = sType)) + # type needs to added
    theme_minimal() +
    labs(title = "LUAD PCA")

The samples >200 along PC1 are considered as outliers (as suggested by literature).

The PCA figure is attached. NT are normals, while TP are tumor samples. PCA_image

ADD REPLY

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6