Entering edit mode
4.0 years ago
dustin3141
•
0
I have RNA seq data with 29 samples and 60483 genes, but I only want to focus on protein coding gene. I filter protein coding gene and filter low count then do CPM normaliztion. Is this method reasonable?
Without knowing more details I would leave all genes in for normalization. Be sure to use established normalization methods such as TMM in edgeR (
cpm
function) rather than naive per-million scaling. After this you can filter. What is the final goal? Differential analysis?