Which RUVr batch corrected output is better to calculate TPM?
Entering edit mode
7 weeks ago
star ▴ 330

I have downloaded several samples from 5 studies (5 batches).

Example of my count table:

         S_rep1_batch1 S_rep2_batch1 S_rep1_batch2 S_rep2_batch2 S_rep3_batch2  .  .  .
  Gene1         34          54             65            76            67
  Gene2         87          77             90            35            19
  Gene3         47          67             70            85            99

I would like to do Differentially Enrichment Analysis (DEA) between samples, also compare them based on gene expression profile (Heatmap and cluster based on a subset of gene list). To remove batch effect I have used RUVr with k=13 from RUVSeq R packages.

RUVr <- RUVSeq::RUVr(df, genes, k=13, res)

To calculate DEA between samples, I have used counts(RUVr) to make DGEList and to make a gene expression profile and calculate TPM value, I have used normCounts(RUVr).


  • normCounts(RUVr), is the correct input for calculating TPM?

  • if I want to calculate the Average gene expression (Average TPM for each gene), can I get average for the same sample across different batches (e.g. S_batch1.batch2_averageTPM)? or I it is better to calculate for each batch separately (e.g. S_batch1_averageTPM; S_batch2_averageTPM)?

  • or it is better to get raw count table and calculate TPM, then get average TPM for each gene across each batch separately?

RUVSeq RNA-seq batch_effect TPM R • 206 views
Entering edit mode

Does this setup even allow removal of any batch effect? You cannot just randomly collect samples from GEO and expect them to be as if you had produced them under the same conditions. You would need replicates of every group you're testing in every of those batches, is that the case?


Login before adding your answer.

Traffic: 2395 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6