Question: RNASeq differential expression: How to deal with few genes with extremely high expression levels
0
gravatar for komal.rathi
2.1 years ago by
komal.rathi3.4k
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.4k wrote:

Hi everyone,

I have a set of 4 KO and 4 Control samples from mice and I am performing a differential expression on it. My pipeline is STAR alignment to mm10 followed by RSEM. I then use the expected counts from RSEM and normalize them using Voom because I want to perform a differential gene expression using limma.

Here are some QC plots:

Boxplot of Samples before and after normalization: https://ibb.co/d1czbF

PCA of Samples before and after normalization: https://ibb.co/bE85GF

Reads distribution: https://ibb.co/j6R33v

First my controls and KOs do not group as I would expect. But my main concern is the gene expression - you can see in the read distribution plots that there are a few genes with extremely high expression levels. Top 8/20 highest expressing genes belong to chromosome M. I am normalizing the expression levels using Voom (limma) but I wanted to know if this distribution will affect any downstream differential expression results and if yes, how can I fix it?

Thanks!

voom rna-seq limma mtdna • 929 views
ADD COMMENTlink modified 2.1 years ago by Carlo Yague4.4k • written 2.1 years ago by komal.rathi3.4k
4
gravatar for Carlo Yague
2.1 years ago by
Carlo Yague4.4k
Belgium
Carlo Yague4.4k wrote:

But my main concern is the gene expression - you can see in the read distribution plots that there are a few genes with extremely high expression levels.

This is very typical of RNA-seq experiments. You should take the log if you want to see something in the distribution of the un-normalized counts.

First my controls and KOs do not group as I would expect.

If you look at the PCA after normalization, you can see that the sample 1142 HP is a clear outlier that totally dominate the PC1. But on the PC2, the control and KOs samples group relatively well.

I wanted to know if this distribution will affect any downstream differential expression results and if yes, how can I fix it?

Voom normalization is robust to highly expressed genes so it should be ok.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Carlo Yague4.4k

Thanks for the answer - that was very helpful. The reason I was just wondering if my downstream analysis for finding differentially expressed genes got affected because I only found 8 genes differentially expressed and I am quite used to seeing hundreds of genes popping up as differentially expressed.

ADD REPLYlink written 2.1 years ago by komal.rathi3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1605 users visited in the last hour