Question: Why is TCGA Genomic Data Commons missing some data?
0
gravatar for igor
2.9 years ago by
igor8.9k
United States
igor8.9k wrote:

Last summer, TCGA data moved from TCGA Data Portal to the Genomic Data Commons. However, for some reason, some data is missing. At first, I thought it was a temporary glitch, but this hasn't changed over time.

For example, at GDAC Firebrowse, there are 194 TCGA-LAML 450K samples (2016-01-28 batch), but at GDC, there are 140 samples. There is also the GDC Legacy Archive for the legacy data, but that also contains 140 samples.

I've also seen similar discrepancies with RNA-seq data.

How can this be? This is a relatively old dataset. It should be stable. If there were any issue, I assume the samples would have been filtered years ago and it wouldn't be so many.

tcga • 1.2k views
ADD COMMENTlink written 2.9 years ago by igor8.9k
1

Have you tried emailing GDC support (support at nci-gdc.datacommons.io)?

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by genomax75k

I haven't yet. Was hoping this would be faster and provide an unbiased answer.

ADD REPLYlink written 2.9 years ago by igor8.9k

Apparently I was wrong about that. Got a nice response quickly:

There was a problem during the initial import of TCGA data that resulted in some of the TCGA-LAML being missed. Our development team is currently working on fixing this issue, but I'm not sure when it will be resolved.

ADD REPLYlink written 2.9 years ago by igor8.9k

Btw, if you were in the loop during the old TCGA DCC era, there was an emergency redaction of TCGA LAML data for a couple of months. GDC happened to pull TCGA data from DCC and CGHub during that period of time, so that's why they are missing

ADD REPLYlink modified 11 months ago • written 11 months ago by Zhenyu Zhang260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 975 users visited in the last hour