The total expressed genes in RNA-Seq data
1
0
Entering edit mode
10 months ago
Pegasus ▴ 130

Hey everyone,

When it comes to RNA-Seq data analysis using edgeR, which filter is commonly used to determine the "total expressed genes."

While I've employed the criterion of logCPM > 1 as one of the filters to identify differentially expressed genes (DEGs), I'm uncertain whether I should apply the same filter to calculate the total expressed genes.

total <- total[total$logCPM > 1, ]

Alternatively, some discussions suggest considering TPM (Transcripts Per Million) for this purpose.

normed <- normed[rowSums(normed > 0) > 1, ] 

Thanks for any insights!

RNA-SEQ • 418 views
ADD COMMENT
1
Entering edit mode
10 months ago
ATpoint 87k

There is no robust definition of "expressed" genes, this has been asked many times before. edgeR doesn't care about "expressed", it cares (by filterByExpr) about sufficient counts for a differential analysis. That is often misinterpreted. See edgeR user guide for the recommended filter (filterByExpr). Choice of expression value does not change the fact that definition of "expressed" is arbitrary without a gold standard to benchmark against.

People sometimes rank genes based on FPKM and then take the inflexion point of the curve, or define simple cutoffs like FPKM > 1, but after all, the cell does not care about inflexion or expression units. These approaches are naive and not robust. Ask yourself if you really need "expressed" genes for your analysis, rather than just those with sufficient counts as edgeR defines them.

ADD COMMENT

Login before adding your answer.

Traffic: 2254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6