Question

Subsetting before EdgeR differential gene expression

0

Entering edit mode

8.2 years ago

createanotherone • 0

Hi all,

I am analysing an RNA-seq experiment with 8 treatment vs 8 control samples collected from primary tissue.

Having performed differential gene expression analysis (DGE) between these samples using an edgeR exact test and we noticed that genes that are specifically expressed in our cell type of interest (deduced through another separate study) are seemingly systematically reduced in expression in the treatment group.

We believe this is likely due to a difference in cell composition between the two samples despite due care taken in the collection procedure - resulting in more reads being consumed by genes specific to cells we aren't interested in reducing the number over the genes specific to the cells we are, therefore reducing the amount of data for these genes which is being falsely called as differential expression.

I was wondering, is it sound to subset the data to just the genes we are confident are specific to the cell type we want to investigate, normalise for the coverage across this gene set (like a pseudo-library size adjustment), and perform DGE just on this gene set? Perhaps with a conservative false discovery rate adjustment using the total number of expressed genes (not the number in the subset)?

Any advice would be greatly appreciated!

Thanks

Scott

RNA-Seq R • 2.6k views

ADD COMMENT • link 8.2 years ago by createanotherone • 0

0

Entering edit mode

I was wondering the same (although using DESeq instead). Have you found an answer elsewhere?

ADD REPLY • link 7.4 years ago by lm687 ▴ 50