Which one do you recommend for the problem with the smaller number of normal samples of TCGA?
1
1
Entering edit mode
17 months ago
Zahra ▴ 110

Hi all, I've analyzed the HTSeq count data of TCGA and have used the DESeq2 for normalization. As you know there are a small number of normal samples in TCGA for each cancer type. Is it acceptable to perform the DEA with these unbalanced samples (in terms of number)?

Which one do you recommend?

  1. Using just all the data of TCGA (e.g. 533 tumor samples and 59 normal).
  2. Sequestering the same number of tumor and normal samples of TCGA and just analyzing them.
  3. Using normal samples from another database (e.g GTEx) which has a larger number of samples.

Thanks for any help.

DESeq2 GTEx TCGA • 616 views
ADD COMMENT
5
Entering edit mode
17 months ago

There is no real problem using unbalanced number of replicates in DEA, as long as each condition has enough samplse to measure the variance (and 59 is definately enough), then its fine.

Definately don't do 3 - counts from one project are not comparable to counts from a second project.

One thing you might need to pay attension to, which such large numbers of samples, is how homogenous the samples are. Samples from projects like TCGA need to be treated like observational data, not experimental data. There are undoubtly heterogenetities in the data that will throw off the model assumptions.

See this thread from MikeLove for some suggestions:

ADD COMMENT

Login before adding your answer.

Traffic: 1775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6