Which RUVr batch corrected output is better to calculate TPM?
0
0
Entering edit mode
7 weeks ago
star ▴ 330

I have downloaded several samples from 5 studies (5 batches).

Example of my count table:

         S_rep1_batch1 S_rep2_batch1 S_rep1_batch2 S_rep2_batch2 S_rep3_batch2  .  .  .
Gene1         34          54             65            76            67
Gene2         87          77             90            35            19
Gene3         47          67             70            85            99
.
.


I would like to do Differentially Enrichment Analysis (DEA) between samples, also compare them based on gene expression profile (Heatmap and cluster based on a subset of gene list). To remove batch effect I have used RUVr with k=13 from RUVSeq R packages.

RUVr <- RUVSeq::RUVr(df, genes, k=13, res)


To calculate DEA between samples, I have used counts(RUVr) to make DGEList and to make a gene expression profile and calculate TPM value, I have used normCounts(RUVr).

questions:

• normCounts(RUVr), is the correct input for calculating TPM?

• if I want to calculate the Average gene expression (Average TPM for each gene), can I get average for the same sample across different batches (e.g. S_batch1.batch2_averageTPM)? or I it is better to calculate for each batch separately (e.g. S_batch1_averageTPM; S_batch2_averageTPM)?

• or it is better to get raw count table and calculate TPM, then get average TPM for each gene across each batch separately?

RUVSeq RNA-seq batch_effect TPM R • 206 views
1
Entering edit mode

Does this setup even allow removal of any batch effect? You cannot just randomly collect samples from GEO and expect them to be as if you had produced them under the same conditions. You would need replicates of every group you're testing in every of those batches, is that the case?