Possible approach to select normal tissue samples for cancer RNA-Seq data without reference data for downstream analyses
0
0
Entering edit mode
16 months ago
svlachavas ▴ 740

Dear Community,

based on a clinical project of high-throughput genomics data, we have gathered a high number of RNA-Seq samples from patients with different solid tumors, that have undergone conventional therapy prior sequencing. All the data have been uniformly processed through R. The major issue that we would like to perform differential expression analysis or machine learning techniques, to select the most DE or more informative genes based on some reference sample group, but unfortunately we do not have any reference normal or control samples for the whole cohort.

I thought a naive idea of using external normal data sources, such as GTEx-however, my main concern is that still batch effect correction might not be applicable, such as ComBat, because both batch studies are totally confounded ? (i.e. both sample types are not represented in both studies..)

Any ideas or suggestions how this issue might be addressed ?

Best,

Efstathios

RNA-Seq R batch effect GTEx DE • 344 views
ADD COMMENT
1
Entering edit mode

If you dd not obtain any normal tissues and processed with the same kits as the tumor samples and any differences you will see will most likely be caused by technical batch effects. Comparing your data with any downloaded data in the same statistical analysis is pointless. This is (sorry to say) something you should have thought about before gathering the tumor samples. Only change I see would be to gather normal samples now, process them identically in the wetlab, plus some additional tumor samples to correct for batch differences and then run the analysis. If this is not possible you are limited to comparisons within your cohort, e.g. splitting samples into like high/low based on expression of important genes.

ADD REPLY
0
Entering edit mode

Dear ATpoint,

thank you for your strong point- to be honest, i did not participate in any prior experimental design of the project, and I was implicated after the creation of the data. Unfortunately, these are some older data, that's why as I very recently got into the analysis and any relative information, I also saw the bottleneck of the absence of the normal samples. In addition, in your opinion, based on putative limited implementations, you would think also a co-expression network would help ? for the identification of "important genes" ? or to rank the genes based on any measure ?

ADD REPLY
1
Entering edit mode

That fully depends on the question you want to answer. I just wanted to point out that you should not include independent datasets into the same analysis as batch effects will dominate the results.

ADD REPLY

Login before adding your answer.

Traffic: 1991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6