Question: Calculate scaling factor for RNAseq data
0
gravatar for pbigbig
9 months ago by
pbigbig200
United States
pbigbig200 wrote:

Hi everyone,

I have a RNA-seq expression count matrix of 2 contrast conditions (~10 biological samples per condition), but these conditions are affected by (severe) batch effect from different sequencing experiments. I looked up for some batch-effect removal tools, but they could only fix batch-effect for samples of same condition group (different conditions may contain large true biological variations that account for most of batch-effect difference).

I plan to choose a group of housekeeping genes to adjust for this group difference, but I am still confusing about practical steps to do that. Could you please give me some suggestions? Here are some thoughts I am still questioning:

  • Should I perform TPM then TMM cross-sample normalization before considering expression value of these housekeeping gene?
  • In these housekeeping genes, there is probably a large difference in expression value between them, how could I straighten all of them down to one scaling factor for each sample, and then scale expression level of all other genes by this factor?

Thank you very much.

rna-seq normalization • 279 views
ADD COMMENTlink modified 8 months ago by swbarnes27.5k • written 9 months ago by pbigbig200

What separates these "batch groups"? Different library preps, sequencing machine?

ADD REPLYlink written 9 months ago by ATpoint31k

Yes, because I obtained them from different public databases, so 2 experiment protocols were totally different (however both data are Hiseq Illumina reads)

ADD REPLYlink written 9 months ago by pbigbig200

Then I see little chance of using them in the same analysis, especially because you have absolutely no way of validating the results by an independent experimental approach.

ADD REPLYlink written 9 months ago by ATpoint31k
0
gravatar for swbarnes2
8 months ago by
swbarnes27.5k
United States
swbarnes27.5k wrote:

If you are saying that batch is completely confounded with condition, then there likely isn't anything you can do.

ADD COMMENTlink written 8 months ago by swbarnes27.5k

Yeah, I also thought so.

However, you know that it isn't practical to write in a research proposal something for example: "we need to collect 100 brain tissues from these patients with disease and also collect dozen of fresh brain pieces from normal people just to perform RNAseq experiments at the same condition"

ADD REPLYlink modified 8 months ago • written 8 months ago by pbigbig200
1

If you want meaningful results it is. Maybe don't write 100 but something more modest like 10 or so to avoid scaring of the reviewers in terms of anticipated costs.

ADD REPLYlink written 8 months ago by ATpoint31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1098 users visited in the last hour