Question: How to Identify Housekeeping genes from gene expression matrix and normalize the data
gravatar for bioyas
6 days ago by
bioyas0 wrote:

Hi everyone,

I have a matrix of expression(counts) derived from 2 RNA experiments. I have combined the two experiments . I would like to normalize the data in away that at the end I have the expressions at the same scale and can do Differentially expression analysis.

What I have tried so far is using edgeR normalization method and limma package to remove the batches which failed to put the data in same scale.

Now I would like to try normalization with respect to House-Keeping genes which is new to me. I don't know how I can find those HKGs from the expression data. Is there any R package to detect those genes or I need to find them manually? If So, How should I do that?

The next step after finding the list of HKGs is the Normalization using this genes' expression where I need help too.

Thank you in advance

ADD COMMENTlink written 6 days ago by bioyas0

Are these independent experiments? Maybe this answers some questions:

Basic normalization, batch correction and visualization of RNA-seq data

ADD REPLYlink modified 6 days ago • written 6 days ago by ATpoint38k

Thanks for your reply. The tutorial is really helpful and what they are explaining is what I have already tried. The experiments were done separately and thats one of the reasons that they are not at the same scale.

I thought that housekeeping gene normalization might be the answer to my question but I am not familiar with it. Any insight on that is appreciated.

ADD REPLYlink written 6 days ago by bioyas0

Did you apply any of the strategy from the tutorial, such as using PCA to detect for batch effects? You cannot just combine different experiments. You have to check whether there are batch effects, and if so, then account for them.

ADD REPLYlink written 6 days ago by ATpoint38k

I did not apply the PCA part. What I did is normalization using edger and then using removeBatchEffect() from limma package to remove batch effects. After all of this drawing heatmap and hierarchical clustering still shows that the samples are separated based on experiments which means that I could not make them in same scale.

ADD REPLYlink written 6 days ago by bioyas0

This is why I linked this tutorial which explicitely mentiones that you cannot simply combine independent experiments. They are most likely confounded. For batch removal you'd need replicates of each of the experimental groups (say normal and cancer, or whatever your design is) in both batches (so in both datasets). If you don't have that then you cannot combine them because you cannot remove the batch effect. I suggest you perform PCA as described.

Can you give some details? What are the experiments, what are the groups per experiment and do you have replicates of each groups in both datasets?

ADD REPLYlink modified 6 days ago • written 6 days ago by ATpoint38k

Thanks for your comment. I will try to do the PCA. Both data sets are coming from RNA seq experiments and I have several groups(samples) and there are 3 replicates for each sample in first data set and 4 replicates for each sample in 2nd data set.

As I said before when I use normalization I get two clusters based on 2 data sets. Which means that the there is batch between the 2 datasets.

ADD REPLYlink written 4 days ago by bioyas0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1540 users visited in the last hour