RNAseq normal matches using publicly available databases
1
0
Entering edit mode
10 months ago
geneart ▴ 50

Hi, Was wondering if anyone has any advice on using publicly available normal matches to use in RNAseq analysis with our tumor samples. We have liver cancer tumors but lack normal tissue for some of the cases. So can we perform RNAseq analysis using liver normal RNAseq data from TCGA with our tumor data? 25 tumor-25 normal (different normal patients for liver from TCGA)

Does this pose a problem of any kind in pairwise analysis? They will not be from the same patient as ours obviously but still normal and molecular signatures for normal should be pretty much same ? Is there anything I will need to pay close attention to?

I have not done this type of analysis before and so please pardon my ignorance here!

RNA-Seq TCGA DGE normal tumor • 332 views
2
Entering edit mode
10 months ago
ATpoint 55k

No, you cannot. Please use the search function for previous threads on that matter. The crux here is that the condition (tumor/normal) are confounded by study so you cannot distinguish biological from technical/batch effects, and there will be plenty. For illustration, you can go through Basic normalization, batch correction and visualization of RNA-seq data which contains data from the exact same specimen but prepared with different library prep kits. That somewhat mimics different studies. You will see that if you perform DEG analysis between the sample sample, just comparing kits there will be hundreds of DEGs. That means your analysis would be spawned with false calls that do not reflect any biological but pure technical differences. For tumor-only data it will probably come down to either defining sub-groups in your data, either basic on clinical metadata or clustering-based approaches, and then compare these groups with each other.

0
Entering edit mode

Thankyou ! This totally makes sense! If there is situation when one has to use a normal match from another source can we bioinformatically normalize the variations or minimize the variability?

I read this paper https://www.nature.com/articles/s41467-017-01027-z recently and they have used a R package to minimize the variability in their analysis as they compared GTEx data with TCGA data. I dont know much about this package and would like to hear from anyone about this or anything similar that we can use to bioinformatically to iron out the variabilities. Any input on this is greatly appreciated !

Thankyou again !