Are TCGA data from UCSC cancer browser and TCGAbiolinks different?
1
0
Entering edit mode
2.6 years ago
wenbinm ▴ 30

Hi there,

library(TCGAbiolinks)
query <- GDCquery(project = 'TCGA-BRCA', data.category = 'Transcriptome Profiling', data.type = 'Gene Expression Quantification', workflow.type = 'HTSeq - Counts')
brca.seq <- GDCprepare(query)


And checked the expression of SOX10:

library(DESeq2)
r = rowData(brca.seq)
as.numeric(assay(brca.seq[which(r\$external_gene_name == 'SOX10'),]))


It turns out its expression is zero in all patients. But in data from UCSC cancer browser (HiSeqV2) SOX10 average expression is 6. The data from UCSC can be found here: https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz

Another question, TCGAbiolinks is more updated than UCSC caner browser as it directly downloads data from TCGA right?

Thank you!

TCGA RNA-Seq • 1.0k views
1
Entering edit mode
2.6 years ago
mary ▴ 20

Hello,

Can you please tell me how you are seeing that the expression is 6 in UCSC Xena? For me I see that it is 0 for all samples in the GDC TCGA BRCA cohort: https://xenabrowser.net/?bookmark=1c841f9f54e697573dc2d9aa5b6be22b (sorry about the red color, it is because Xena is not sure how to color the samples when they are all the same value)

While technically the data from TCGAbiolinks will be more up-to-date than UCSC Xena, for this particular data there is unlikely to be a lag since it has been out for a long time.

Best, Mary

0
Entering edit mode

Then I took a look at SOX10 expression data and the first 5 numbers are 6.5221 0 8.308 6.3628 0.5819. Maybe I make some mistakes here.......

1
Entering edit mode

Ah, that is the legacy TCGA data, not the TCGA data from the GDC. TCGAbiolinks is the data from the GDC, as far as I can tell. The GDC TCGA data on Xena is here: https://gdc.xenahubs.net/download/TCGA-BRCA/Xena_Matrices/TCGA-BRCA.htseq_fpkm-uq.tsv.gz.

As to why the legacy TCGA data is different from the TCGA data from the GDC, I recommend contacting the GDC: https://gdc.cancer.gov/support

0
Entering edit mode

the legacy TCGA data came from hg19 version and TCGA data from the GDC now use hg38 version. therefore, it will have some difference.