Question

Requesting further clarification on interpreting relative gene expression strength

0

Entering edit mode

9 months ago

Abhishek • 0

What is your opinion on the following paragraph in section 6.3 of RNA-seq workflow.

(http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html) by the creators of DESeq2 package.

Paragraph:

"The heatmap becomes more interesting if we do not look at absolute expression strength but rather at the amount by which each gene deviates in a specific sample from the gene’s average across all samples. Hence, we center each genes’ values across samples, and plot a heatmap (figure below). We provide a data.frame that instructs the pheatmap function how to label the columns."

My doubt is mainly arising when I am adding another dataset to our own dataset. We have say 55 human RNA-seq samples with SCLC and then I have another 81 samples from a different study (https://pubmed.ncbi.nlm.nih.gov/26168399/).

If I have a large matrix with with each column (j) being sample name, with first 55 columns being tempus sample name and then the remaining columns being from the other study and each row (i) is a gene TPM then do you think it makes sense to look at the:

Y_ij = TPM of gene_i_sample_j - mean(TPM of gene_i from all samples from both studies)

I am not sure whether looking at the relative expression of a gene from 2 different sample sets like above makes sense.

R normalization TPM RNA-seq DESeq2 • 593 views

ADD COMMENT • link updated 9 months ago by ATpoint 83k • written 9 months ago by Abhishek • 0

1

Entering edit mode

It makes no sense to combine expression values from independent studies. Please google what a batch effect is.

I am not sure whether looking at the relative expression of a gene from 2 different sample sets like above makes sense.

If sample means study then no, makes no sense.

ADD REPLY • link 9 months ago by ATpoint 83k

0

Entering edit mode

Thank you for your reply. How would you suggest one might compare say the difference in gene expression of a gene of interest between one's own patient sample vs a publicly available patient sequence dataset.

I am looking at cancer patients where my patients are mostly late stage tumor and comparing it to a patient cohort with mostly early stage tumor. I was hoping to plot a heatmap with the combined dataset but then after looking into batch effects, I am not sure if I can do that anymore.

I would appreciate your feedback and/or being redirected to a relevant resource.

Thanks!

ADD REPLY • link 9 months ago by Abhishek • 0

0

Entering edit mode

I would never do such a comparison due to mentioned batch effects. RNA-seq is a relative measure.

ADD REPLY • link 9 months ago by ATpoint 83k