edgeR normalization
0
0
Entering edit mode
19 months ago
rheab1230 ▴ 140

Hello everyone, I am using TMM normalization from edgeR package to normalize by read counts. I also want to use some filtering criteria to remove low expressed genes. My filtering criteria are:

=6 reads in >=20%samples =0.1CPM in >=20%samples I am using this code for the task:

##filter gene reads(>=6) in >=20%samples

ns <- length(colnames(GTEx_Analysis_gene_reads_cortex)) ##gets total sample from the read count
read.Cutoff >- 6
min.samples <- 0.2*ns ## consider >=20%samples
keep <- apply(GTEx_Analysis_gene_reads_cortex, 1, function(x, n = min.samples){
t = sum(x >= read.Cutoff) >= n
t
}
)
new_data <- GTEx_Analysis_gene_reads_cortex[keep,]
 ##normalization using edgeR by TMM method:
dge <- DGEList(df_merge)                        # DGEList object created from the count data
dge2 <- calcNormFactors(dge, method = "TMM")    # TMM normalization calculate the normfactors 
pseudo_normcounts <- log2(cpm(dge2) + 1)
#filter based on cpm:
cpm.Cutoff >- 0.1
min.samples <- 0.2*ns ## consider >=20%samples
keep <- apply(new_data, 1, function(x, n = min.samples){
t = sum(x >= cpm.Cutoff) >= n
t
}
)

Once I am done with this: my genes number is drastically reduced from: 56200 to 3223. Does anyone have any idea if what I am doing is wrong? or should i reduce the filtering steps? I also have one more doubt: does pseudo_normcounts contain normalized gene count or normalized cpm? Does anyone know how can I remove those samples that have <6read count and <0.1CPM. right now its keeping the samples and removing only genes. What I want to do is keep the genes but remove the samples wherein both the condition is met but i also cannot combine these two conditions together. Right now only genes are getting filtered out, but I want to remove samples that satisfy these two conditions. Thank you.

TMM normalization edgeR read_counts • 665 views
ADD COMMENT
0
Entering edit mode

I am trying to follow the gtex qtl normalization for my datasets: Read counts are normalized between samples using TMM Genes are selected based on the following expression thresholds: ≥0.1 TPM in ≥20% samples AND ≥6 reads (unnormalized) in ≥20% samples Just that I don't want to do the next step of inverse quantile normalization. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6