Question

edgeR normalization

0

Entering edit mode

19 months ago

rheab1230 ▴ 140

Hello everyone, I am using TMM normalization from edgeR package to normalize by read counts. I also want to use some filtering criteria to remove low expressed genes. My filtering criteria are:

=6 reads in >=20%samples =0.1CPM in >=20%samples I am using this code for the task:

##filter gene reads(>=6) in >=20%samples

ns <- length(colnames(GTEx_Analysis_gene_reads_cortex)) ##gets total sample from the read count
read.Cutoff >- 6
min.samples <- 0.2*ns ## consider >=20%samples
keep <- apply(GTEx_Analysis_gene_reads_cortex, 1, function(x, n = min.samples){
t = sum(x >= read.Cutoff) >= n
t
}
)
new_data <- GTEx_Analysis_gene_reads_cortex[keep,]
 ##normalization using edgeR by TMM method:
dge <- DGEList(df_merge)                        # DGEList object created from the count data
dge2 <- calcNormFactors(dge, method = "TMM")    # TMM normalization calculate the normfactors 
pseudo_normcounts <- log2(cpm(dge2) + 1)
#filter based on cpm:
cpm.Cutoff >- 0.1
min.samples <- 0.2*ns ## consider >=20%samples
keep <- apply(new_data, 1, function(x, n = min.samples){
t = sum(x >= cpm.Cutoff) >= n
t
}
)

Once I am done with this: my genes number is drastically reduced from: 56200 to 3223. Does anyone have any idea if what I am doing is wrong? or should i reduce the filtering steps? I also have one more doubt: does pseudo_normcounts contain normalized gene count or normalized cpm? Does anyone know how can I remove those samples that have <6read count and <0.1CPM. right now its keeping the samples and removing only genes. What I want to do is keep the genes but remove the samples wherein both the condition is met but i also cannot combine these two conditions together. Right now only genes are getting filtered out, but I want to remove samples that satisfy these two conditions. Thank you.

TMM normalization edgeR read_counts • 665 views

ADD COMMENT • link 19 months ago by rheab1230 ▴ 140

0

Entering edit mode

I am trying to follow the gtex qtl normalization for my datasets: Read counts are normalized between samples using TMM Genes are selected based on the following expression thresholds: ≥0.1 TPM in ≥20% samples AND ≥6 reads (unnormalized) in ≥20% samples Just that I don't want to do the next step of inverse quantile normalization. Thank you.

ADD REPLY • link 19 months ago by rheab1230 ▴ 140