Question

Possible approach to select normal tissue samples for cancer RNA-Seq data without reference data for downstream analyses

0

Entering edit mode

4.1 years ago

svlachavas ▴ 790

Dear Community,

based on a clinical project of high-throughput genomics data, we have gathered a high number of RNA-Seq samples from patients with different solid tumors, that have undergone conventional therapy prior sequencing. All the data have been uniformly processed through R. The major issue that we would like to perform differential expression analysis or machine learning techniques, to select the most DE or more informative genes based on some reference sample group, but unfortunately we do not have any reference normal or control samples for the whole cohort.

I thought a naive idea of using external normal data sources, such as GTEx-however, my main concern is that still batch effect correction might not be applicable, such as ComBat, because both batch studies are totally confounded ? (i.e. both sample types are not represented in both studies..)

Any ideas or suggestions how this issue might be addressed ?

Best,
Efstathios

R RNA-Seq DE batch-effect GTEx • 930 views

ADD COMMENT • link updated 11 days ago by Ram 43k • written 4.1 years ago by svlachavas ▴ 790

1

Entering edit mode

If you dd not obtain any normal tissues and processed with the same kits as the tumor samples and any differences you will see will most likely be caused by technical batch effects. Comparing your data with any downloaded data in the same statistical analysis is pointless. This is (sorry to say) something you should have thought about before gathering the tumor samples. Only change I see would be to gather normal samples now, process them identically in the wetlab, plus some additional tumor samples to correct for batch differences and then run the analysis. If this is not possible you are limited to comparisons within your cohort, e.g. splitting samples into like high/low based on expression of important genes.

ADD REPLY • link 4.1 years ago by ATpoint 82k

0

Entering edit mode

Dear ATpoint,

thank you for your strong point- to be honest, i did not participate in any prior experimental design of the project, and I was implicated after the creation of the data. Unfortunately, these are some older data, that's why as I very recently got into the analysis and any relative information, I also saw the bottleneck of the absence of the normal samples. In addition, in your opinion, based on putative limited implementations, you would think also a co-expression network would help ? for the identification of "important genes" ? or to rank the genes based on any measure ?

ADD REPLY • link 4.1 years ago by svlachavas ▴ 790

1

Entering edit mode

That fully depends on the question you want to answer. I just wanted to point out that you should not include independent datasets into the same analysis as batch effects will dominate the results.

ADD REPLY • link 4.1 years ago by ATpoint 82k