Question: Does TCGA data represents whole population
I am working on TCGA Datasets and have few questions regarding this. My work is mainly involved around identifying genes which are disease-causing, i.e directly or by secondary activation. going forward, I have few doubts and need your opinion on the following

  1. Can TCGA datasets be used to predict prognosis for Indian Population as well? (Results from the TCGA analysis) since all samples are sourced from a particular geographic location - Demographic differences.
  2. What sample size is the best sample size? or How much is good or how much is too much. your opinion/ view on this is valuable and It would be really helpful if you can assist me in any kind of literature available on this.

Someone will be along with more precise answers but from my general understanding.

  1. That is hard to say. I don't think there are any direct Indian samples in TCGA dataset. There may be samples of people of Indian origin who reside in the US.
  2. India being the second most populous country would likely require a very large sample size to truly represent the actual population.
Here is some work that goes over the different ethnic groups in TCGA:

I think it likely that Indians from India are grouped with 'Asian', so, numbers are probably extremely low. One could determine ethnicity by taking the TCGA genotype data and plotting it against 1000 Genomes, as I do here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

1000 Genomes has:

  • Gujarati Indian from Houston, Texas
  • Indian Telugu from the UK

However, doing that is a large project and could take weeks / months.

Thanks a lot, Kevin, one quick question, can we treat Diaspora on par with Native Population? I mean, if 2nd generations Indians/Asians(Who were born in the US) staying in the US for a very long time then can we treat them alike? Thanks a lot, Kevin.

If one were to look at it logically, how much does a genome change over two generations?

Hi Ram Thanks for the reply, maybe not that much to change into new species. But there will be some changes, some mutations, which might be commonly not seen in native populations because of so many reasons like Environment, Diet change, etc. I might be wrong but This thought in me is pricking from inside. Thanks, RamRs

I'm trying to understand your point, David_emir. But it feels unnatural to think that a bunch of Indians that moved to the US might see significant changes in their germline DNA within a couple of generations. The way we can answer this is to figure out how 1000g stratified individuals, such as their verification of lineage (are these Indians an offspring of another Indian that moved to the US and had a child with a non-Indian) and of the omics dimension (are we looking at the genome, epigenome or the gut microbiome?).

But yeah, it would be an interesting exercise to see if any of the TCGA samples cluster with 1000g SAS samples in PCA.

