Question

RNA seq with edge R

0

Entering edit mode

5.8 years ago

31jsheetal • 0

We are running an edgeR script in the classic mode i.e. without the design matrix. The thing is we get the result of the edgeR but we are unable to identify to which genes which FDR or Log FC etc belongs. The output looks something like what I have attached below. Please elaborate on how can we call out for the Geneid column in the output as well. Attached is my CSV output file. I'm not sure what the first column is, but I'm assuming it is the row number. Thank you in advance

Comparison of groups:  CPeM-CPeC 
            logFC        logCPM       PValue          FDR
48178   9.2968063  8.7531497859 1.339698e-88 7.015328e-84
50934  -8.8743164  9.2802946181 4.600196e-84 1.204446e-79
45623  -9.0342436  7.9331539929 6.593547e-80 1.150904e-75
49291   8.5775628  7.0893005292 8.493058e-78 1.111848e-73
37120  -7.9915437  6.7177772577 8.642303e-74 9.051084e-70
45475  -7.7424580  7.6449239664 5.610419e-73 4.896493e-69
32316   7.0196112  7.8150257401 2.529690e-64 1.892389e-60
32293   7.0582453  6.3341181139 1.568164e-62 1.026462e-58
40924  -7.3083772  7.9254370926 7.675463e-60 4.465840e

rna-seq • 1.1k views

ADD COMMENT • link updated 5.8 years ago by h.mon 35k • written 5.8 years ago by 31jsheetal • 0

0

Entering edit mode

Please add the code you used to generate this result.

ADD REPLY • link 5.8 years ago by WouterDeCoster 47k

0

Entering edit mode

You'll need to annotate your genes first. Make rownames your unique IDs before the edgeR fit. How does your count matrix looks like? Give more code and/or examples of how your data looks like.

ADD REPLY • link 5.8 years ago by Benn 8.3k

score 0 · Answer 1 · 2018-07-18

edgeR uses the row names of the counts slot to identify the genes. Mock code:

y <- DGEList( counts = counts, group = treatment )
y <- calcNormFactors( y )
y <- estimateDisp( y, design )
fit <- glmFit( y, design )
lrt <- glmLRT(fit, contrast = contrast )
tt <- topTags( lrt, sort.by = "none", n = "NULL" )

inspect the counts slot of your DGEList object:

head( y$counts, n = 2 )

Output:

         A1   A2   A3   B1   B2   B3
130541  437  416  455  433  380  412
128741 5290 6167 4543 6453 6016 7418

Inspect the table slot of your TopTags object:

head( tt$table )

Output:

              logFC    logCPM         LR     PValue       FDR
130541    0.08218954  4.967360 0.09360040 0.75964894 0.9786278
128741   -0.09887031  8.734469 0.35966293 0.54869349 0.9316098

If you add a genes slot, this information will be added to the output of the topTags object:

y <- DGEList( counts = counts, group = treatment )
y$genes <- genes
y <- calcNormFactors( y )
y <- estimateDisp( y, design )
fit <- glmFit( y, design )
lrt <- glmLRT(fit, contrast = contrast )
tt <- topTags( lrt, sort.by = "none", n = "NULL" )

Inspect the genes matrix

head( genes, n = 2 )

Output:

  Accession Uniprot Gene.Names
1    130541  A0AVT1       UBA6
2    128741  A0JNA3     IMPDH1

The counts slot of the DGEList is the same:

head( y$counts, n = 2 )

Output:

         A1   A2   A3   B1   B2   B3
130541  437  416  455  433  380  412
128741 5290 6167 4543 6453 6016 7418

But now inspect the topTags object:

head( tt$table, n = 2 )

Output:

       Accession Uniprot Gene.Names       logFC   logCPM        LR    PValue       FDR
130541    130541  A0AVT1       UBA6  0.08218954 4.967360 0.0936004 0.7596489 0.9786278
128741    128741  A0JNA3     IMPDH1 -0.09887031 8.734469 0.3596629 0.5486935 0.9316098