Hi Everyone, I am working on a set of samples (generated using Illumina) for differential gene expression between control and treated samples. I performed pre processing using few tools, used mapping tool and did the feature counting. Now i have a single txt file having gene names and counts for all samples. i have few questions.
1)To make tag counts comparable in samples, a normalization must be performed. i have been asked by my supervior to se RPKM. (RPKM (reads per kilobase per million) is a method of normalization that is widely used in RNA-seq analysis). Do i have to First filter all the genes that is zero across all the samples and then normalize. If yes, can anyone tell me how to normalize my file using RPKM by just giving my feature count txt file as input. (any bioconductor, R, python package).
2) Once i have the normalized file, using RPKM, i would like to find out differentially expressed genes. How should i do that.
I know , DESeq and edgeR packages differ in their default normalization: edgeR uses the trimmed mean of M values56, whereas DESeq uses a relative log expression approach. But i am interested in RPKM(as per requirement).