Question: how to find sample type information from TCGA?
6
gravatar for aichen
3.0 years ago by
aichen80
aichen80 wrote:

After download batches of gene expression files from TCGA gdc (about 600 hundreds files), each file contains only one sample. I wanna know which file is a tumor sample, which file is a normal sample. I want to find differentially expressed genes via these genes next step. However, I don't know how to find sample type information in TCGA, anyone can help me? What I want is like this, a wiki of TCGA, because I didn't find similar tools in gdc.

I hope I can download a tab-delimited file contain these information:

study barcode disease disease_name sample_type sample_type_name analyte_type library_type center center_name platform filename

TCGA TCGA-56-7222-01A-11R-2045-07 LUSC Lung squamous cell carcinoma TP 01 RNA RNA-Seq UNC-LCCC UNC-LCCC ILLUMINA UNCID_2195465.13daa1a0-a236-474e-b621-eb131be34af1.120305_UNC16-SN851_0133_AC0JB6ACXX_6_GGCTAC.tar.gz

rna-seq tcga • 7.7k views
ADD COMMENTlink modified 3.0 years ago by EagleEye6.4k • written 3.0 years ago by aichen80
3
gravatar for EagleEye
3.0 years ago by
EagleEye6.4k
Sweden
EagleEye6.4k wrote:

TCGA-56-7222-01A-11R-2045-07

Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29

https://wiki.nci.nih.gov/display/TCGA/TCGA+barcode?desktop=true¯oName=unmigrated-inline-wiki-markup

You can get tab-limited files for all cancer types using CgHub as manifest files, since CgHub is discontinued you can try with gdc portal,

https://gdc-portal.nci.nih.gov/search/s?filters=%7B%22op%22:%22and%22,%22content%22:%5B%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.program.name%22,%22value%22:%5B%22TCGA%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:%5B%22TCGA-BRCA%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.disease_type%22,%22value%22:%5B%22Breast%20Invasive%20Carcinoma%22%5D%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.primary_site%22,%22value%22:%5B%22Breast%22%5D%7D%7D%5D%7D

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by EagleEye6.4k

Thanks for answering me, but do you know how to get the corresponding TCGA barcode of a given file name, eg. file "UNCID_1840921.148d34df-aec2-42bf-8e36-b91a68959606.sorted_genome_alignments.bam" is belong to "TCGA-43-7657-11A-01R-2125-07"

ADD REPLYlink written 3.0 years ago by aichen80

I used to get tab-limited files, but I didn't find a similar filter tools in gdc

ADD REPLYlink written 3.0 years ago by aichen80

Yes GDC is bit confusing right now. I wish they will improve the documentation in future.

Select Data -> cancer program TCGA + project TCGA-XXXX ( in cases tab) -> Data format TSV ( on files tab)

Then Download Manifest

I hope this works.

Or try with GDC tool,

https://gdc-docs.nci.nih.gov/API/PDF/API_UG.pdf#page50

Manifest endpoint

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by EagleEye6.4k
1

Thanks for replying me ! But I'm sorry to say that it didn't work because I got the file like this: id filename md5 size state b99b9f44-00d2-443c-93f2-b0357491ed63 isoforms.quantification.txt a20f30bc3fe55fae1433949495884514 358879 submitted 1a657f88-2a88-4c46-b7ce-4d48c6d6ba15 isoforms.quantification.txt 3042012a718a35a59acd74bd97a1d257 410661 submitted

I found if I click download, choose "biospeciman" or "File metadata" , I can get these information including "TCGA barcode", "file name" etc. , but the format is ".json" :(

Thank you all the same!

ADD REPLYlink written 3.0 years ago by aichen80
1

Thats great. You can try converting Json to csv ot tsv using some tools like this,

http://www.convertcsv.com/json-to-csv.htm

I hope it works.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by EagleEye6.4k

Great! Thanks a lot :)

ADD REPLYlink written 3.0 years ago by aichen80

Broad-GDAC could be alternative way to download TCGA data

https://gdac.broadinstitute.org/

ADD REPLYlink written 3.0 years ago by Mike1.3k

Thanks, it can export pdf.

ADD REPLYlink written 3.0 years ago by aichen80

Hi did you solve the problem? Converting GDC filename to TCGA barcode to find tumor or normal sample information

ADD REPLYlink written 2.7 years ago by Chun-Jie Liu260

Did you solve the problem? I met same problem to convert filename TCGA barcode to find tumor normal information.

ADD REPLYlink written 2.7 years ago by Chun-Jie Liu260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1320 users visited in the last hour