RNA-seq data for binary classification problem
1
0
Entering edit mode
4.0 years ago
botloggy ▴ 10

I have the TCGA-READ RNA-seq data obtained from GDC Portal.

The samples are of types: "primary tumor" and "solid tissue normal" collected from individuals. The solid tissue normal is a normal tissue sample that is adjacent to the primary tumor. Henceforth, the solid tissue normal may not necessarily be a normal tissue as the sample is still from an individual who has a tumor. So, it would be incorrect to label such samples as normal.

For the binary classification problem, I need tumor samples and normal samples from disease and healthy individuals respectively.

I don't know if I am right but I think the normal samples from any given TCGA data that are of blood-derived/ solid tissue normal tissue sample types may not be having samples collected from normal individuals (disease-free).

Can anyone please suggest on where/how do I get the normal samples?

If there is some website with normal samples, how do I match the genes from the current tumor data?

Any suggestions are highly appreciated. Thanks

classification tcga RNA-Seq GEO R • 1.2k views
ADD COMMENT
1
Entering edit mode

If you want to put them into the same statistical analysis then you are restricted to those TCGA normals since any independent dataset is a completely different experiment so batch effects would dominate any result you generate rather than a true biological effect.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion.

ADD REPLY
1
Entering edit mode
4.0 years ago
igor 13k

There was a study where they used GTEx normals since TCGA normals are not entirely normal: Comprehensive analysis of normal adjacent to tumor transcriptomes

ADD COMMENT
0
Entering edit mode

Thank you for this link. It is a great resource. Much appreciate it.

ADD REPLY

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6