I have a set of 4 KO and 4 Control samples from mice and I am performing a differential expression on it. My pipeline is STAR alignment to mm10 followed by RSEM. I then use the
expected counts from RSEM and normalize them using Voom because I want to perform a differential gene expression using limma.
Here are some QC plots:
Boxplot of Samples before and after normalization: https://ibb.co/d1czbF
PCA of Samples before and after normalization: https://ibb.co/bE85GF
Reads distribution: https://ibb.co/j6R33v
First my controls and KOs do not group as I would expect. But my main concern is the gene expression - you can see in the read distribution plots that there are a few genes with extremely high expression levels. Top 8/20 highest expressing genes belong to chromosome M. I am normalizing the expression levels using Voom (limma) but I wanted to know if this distribution will affect any downstream differential expression results and if yes, how can I fix it?