EdgeR, How to remove non-coding RNA before DEG analysis?
1
0
Entering edit mode
2.1 years ago
TM ▴ 20

Hi, I am doing gene expression profiling. I would like to know what is the function to remove non-coding RNA before DEG analysis? Thank you in advance!

EdgeR DEGs • 864 views
ADD COMMENT
0
Entering edit mode

At the time quantification (counts) from bams, you can use gtf without non-coidng RNAs.

ADD REPLY
0
Entering edit mode
2.1 years ago
ATpoint 82k

It's pretty straight-forward, you would filter them based on the reference GTF file for the annotations you used. Lets make an example when using mouse GTF from GENCODE.

Lets say you have the output of topTags() from edgeR like this:

tt <- 
structure(list(logFC = c(0.175414931132545, -0.0980587500797469, 
-0.230824032774493, -0.0101493950393324), logCPM = c(7.83607241540638, 
7.64684782555737, 7.78324186529636, 8.03200486091744), F = c(0.101233884539549, 
0.0408091756477534, 0.281859084375293, 0.000512014257066939), 
    PValue = c(0.752940523471853, 0.841506876719988, 0.600071201841097, 
    0.982123050837867), FDR = c(0.990688550252264, 0.990688550252264, 
    0.989532336297865, 0.990688550252264)), row.names = c("Miat", 
"Platr1", "Rb1cc1", "Sox17"), class = "data.frame")

             logFC   logCPM            F    PValue       FDR
Miat    0.17541493 7.836072 0.1012338845 0.7529405 0.9906886
Platr1 -0.09805875 7.646848 0.0408091756 0.8415069 0.9906886
Rb1cc1 -0.23082403 7.783242 0.2818590844 0.6000712 0.9895323
Sox17  -0.01014940 8.032005 0.0005120143 0.9821231 0.9906886

You would load the GTF annotation file, here it is GENCODE vM25 into R via rtracklayer:

gtf <- rtracklayer::import(gtf_url)

wit gtf_url being https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M25/gencode.vM25.annotation.gtf.gz

And then simply pull the gene type you want. You can get an overview with table(gtf$gene_type). In this GTF the lncRNAs are called lincRNA so filter them:

lincrna <- unique(gtf[gtf$gene_type=="lincRNA"]$gene_name)

Now subset the topTags object tt to these genes:

tt[rownames(tt) %in% lincrna,]

             logFC   logCPM          F    PValue       FDR
Miat    0.17541493 7.836072 0.10123388 0.7529405 0.9906886
Platr1 -0.09805875 7.646848 0.04080918 0.8415069 0.9906886

Just do the same based on the GTF that your analysis is based on.

ADD COMMENT
0
Entering edit mode

Hi, thank you for your reply. I did following your comments but the result of my DEGs are still the same (2 DEGs).

  • I have 3 different structures(8 samples). I want to know the gene expression profiling of structure A. I grouped structure A and other 2 structures as others.
  • Does this mean there are no DEGs between my structures? Can you help identify if I did something wrong in my script? The following is my script:
    groupsA <- c("others","A","others","others","A","others","A","others")
    geneexpA <- DGEList(counts=counts_mat, group=groupsA)
    head(geneexpA)

    if (!requireNamespace("BiocManager", quietly = TRUE))
      install.packages("BiocManager")
    BiocManager::install("rtracklayer")
    install.packages("rtracklayer")

    gtf <- rtracklayer::import("gencode.v38.annotation.gtf")
    table(gtf$gene_type)
    lncrna <- unique(gtf[gtf$gene_type=="lncRNA"]$gene_name)
    geneexpA[rownames(geneexpA) %in% lncrna,]
    filterByExpr(geneexpA,remove.lncrna=TRUE) 
    geneexpA <- calcNormFactors(geneexpA)
    geneexpA <- estimateCommonDisp(geneexpA)
    geneexpA <- estimateTagwiseDisp(geneexpA)
    head(geneexpA)
    genediffA <- exactTest(geneexpA)
    genediffA 
    topTags(genediffA)
    genediffA$table <- cbind(genediffA$table, FDR=p.adjust(genediffA$table$PValue, method ='fdr'))
    head(genediffA)
    str(genediffA)
    summary(genediffA)
    gsignA <- genediffA$table[genediffA$table$FDR<0.03,]
    gsignA <- gsignA[order(gsignA$FDR),]
    dim(gsignA)
    head(gsignA)
ADD REPLY
0
Entering edit mode

My recommendation in that other thread was to do the standard edgeR analysis, and then remove the lncRNAs before the multiple testing correction, that would be before calling topTags. That was a suggestion, it could help to get more DEGs if low power was one reason and the reduction in multiple testing burden was beneficial. Or your study is just underpowered or there are simply do DEGs by biology. I also suggested to do some QC, but it's up to you to do that or not.

ADD REPLY

Login before adding your answer.

Traffic: 2197 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6