Hi, I am doing gene expression profiling. I would like to know what is the function to remove non-coding RNA before DEG analysis? Thank you in advance!
Hi, I am doing gene expression profiling. I would like to know what is the function to remove non-coding RNA before DEG analysis? Thank you in advance!
It's pretty straight-forward, you would filter them based on the reference GTF file for the annotations you used. Lets make an example when using mouse GTF from GENCODE.
Lets say you have the output of topTags()
from edgeR like this:
tt <-
structure(list(logFC = c(0.175414931132545, -0.0980587500797469,
-0.230824032774493, -0.0101493950393324), logCPM = c(7.83607241540638,
7.64684782555737, 7.78324186529636, 8.03200486091744), F = c(0.101233884539549,
0.0408091756477534, 0.281859084375293, 0.000512014257066939),
PValue = c(0.752940523471853, 0.841506876719988, 0.600071201841097,
0.982123050837867), FDR = c(0.990688550252264, 0.990688550252264,
0.989532336297865, 0.990688550252264)), row.names = c("Miat",
"Platr1", "Rb1cc1", "Sox17"), class = "data.frame")
logFC logCPM F PValue FDR
Miat 0.17541493 7.836072 0.1012338845 0.7529405 0.9906886
Platr1 -0.09805875 7.646848 0.0408091756 0.8415069 0.9906886
Rb1cc1 -0.23082403 7.783242 0.2818590844 0.6000712 0.9895323
Sox17 -0.01014940 8.032005 0.0005120143 0.9821231 0.9906886
You would load the GTF annotation file, here it is GENCODE vM25 into R via rtracklayer
:
gtf <- rtracklayer::import(gtf_url)
wit gtf_url
being https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/gencode.vM25.annotation.gtf.gz
And then simply pull the gene type you want. You can get an overview with table(gtf$gene_type)
. In this GTF the lncRNAs are called lincRNA
so filter them:
lincrna <- unique(gtf[gtf$gene_type=="lincRNA"]$gene_name)
Now subset the topTags object tt
to these genes:
tt[rownames(tt) %in% lincrna,]
logFC logCPM F PValue FDR
Miat 0.17541493 7.836072 0.10123388 0.7529405 0.9906886
Platr1 -0.09805875 7.646848 0.04080918 0.8415069 0.9906886
Just do the same based on the GTF that your analysis is based on.
Hi, thank you for your reply. I did following your comments but the result of my DEGs are still the same (2 DEGs).
groupsA <- c("others","A","others","others","A","others","A","others")
geneexpA <- DGEList(counts=counts_mat, group=groupsA)
head(geneexpA)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("rtracklayer")
install.packages("rtracklayer")
gtf <- rtracklayer::import("gencode.v38.annotation.gtf")
table(gtf$gene_type)
lncrna <- unique(gtf[gtf$gene_type=="lncRNA"]$gene_name)
geneexpA[rownames(geneexpA) %in% lncrna,]
filterByExpr(geneexpA,remove.lncrna=TRUE)
geneexpA <- calcNormFactors(geneexpA)
geneexpA <- estimateCommonDisp(geneexpA)
geneexpA <- estimateTagwiseDisp(geneexpA)
head(geneexpA)
genediffA <- exactTest(geneexpA)
genediffA
topTags(genediffA)
genediffA$table <- cbind(genediffA$table, FDR=p.adjust(genediffA$table$PValue, method ='fdr'))
head(genediffA)
str(genediffA)
summary(genediffA)
gsignA <- genediffA$table[genediffA$table$FDR<0.03,]
gsignA <- gsignA[order(gsignA$FDR),]
dim(gsignA)
head(gsignA)
My recommendation in that other thread was to do the standard edgeR analysis, and then remove the lncRNAs before the multiple testing correction, that would be before calling topTags
. That was a suggestion, it could help to get more DEGs if low power was one reason and the reduction in multiple testing burden was beneficial. Or your study is just underpowered or there are simply do DEGs by biology. I also suggested to do some QC, but it's up to you to do that or not.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
At the time quantification (counts) from bams, you can use gtf without non-coidng RNAs.