Question: Are TCGA data from UCSC cancer browser and TCGAbiolinks different?
0
gravatar for wenbinm
13 months ago by
wenbinm10
USA
wenbinm10 wrote:

Hi there,

I downloaded TCGA BRCA RNAseq data from UCSC cancer browser or used TCGAbiolinks:

library(TCGAbiolinks)
query <- GDCquery(project = 'TCGA-BRCA', data.category = 'Transcriptome Profiling', data.type = 'Gene Expression Quantification', workflow.type = 'HTSeq - Counts')
GDCdownload(query)
brca.seq <- GDCprepare(query)

And checked the expression of SOX10:

library(DESeq2)
r = rowData(brca.seq)
as.numeric(assay(brca.seq[which(r$external_gene_name == 'SOX10'),]))

It turns out its expression is zero in all patients. But in data from UCSC cancer browser (HiSeqV2) SOX10 average expression is 6. The data from UCSC can be found here: https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz

Another question, TCGAbiolinks is more updated than UCSC caner browser as it directly downloads data from TCGA right?

Thank you!

rna-seq tcga • 590 views
ADD COMMENTlink modified 13 months ago by mary20 • written 13 months ago by wenbinm10
1
gravatar for mary
13 months ago by
mary20
mary20 wrote:

Hello,

Can you please tell me how you are seeing that the expression is 6 in UCSC Xena? For me I see that it is 0 for all samples in the GDC TCGA BRCA cohort: https://xenabrowser.net/?bookmark=1c841f9f54e697573dc2d9aa5b6be22b (sorry about the red color, it is because Xena is not sure how to color the samples when they are all the same value)

While technically the data from TCGAbiolinks will be more up-to-date than UCSC Xena, for this particular data there is unlikely to be a lag since it has been out for a long time.

Best, Mary

ADD COMMENTlink written 13 months ago by mary20

Thank you for your quick response! I downloaded UCSC Xena data from here, unzipped it and opened the file with excel: https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz

Then I took a look at SOX10 expression data and the first 5 numbers are 6.5221 0 8.308 6.3628 0.5819. Maybe I make some mistakes here.......

ADD REPLYlink written 13 months ago by wenbinm10
1

Ah, that is the legacy TCGA data, not the TCGA data from the GDC. TCGAbiolinks is the data from the GDC, as far as I can tell. The GDC TCGA data on Xena is here: https://gdc.xenahubs.net/download/TCGA-BRCA/Xena_Matrices/TCGA-BRCA.htseq_fpkm-uq.tsv.gz.

As to why the legacy TCGA data is different from the TCGA data from the GDC, I recommend contacting the GDC: https://gdc.cancer.gov/support

ADD REPLYlink written 13 months ago by mary20

the legacy TCGA data came from hg19 version and TCGA data from the GDC now use hg38 version. therefore, it will have some difference.

ADD REPLYlink written 11 months ago by Shixiang40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1867 users visited in the last hour