Question: TCGA Broad GDAC Firehose Parse and Match
gravatar for hood821
2.6 years ago by
United States
hood8210 wrote:

Hello, I have downloaded all the cancer types from broad's GDAC Firehose and I've unzipped. There are a ton of files the mage-tab, aux, and level for each piece of data (clinical, rna-seq, protein). I was hoping to find some already established code (R or python) that pulls only the "level" files, pulls the txt files for clinical data, and rna-seq data into an RObject for that cancer type. This would map the sample data identifier to the clinical data identifier, there are so many tcga id's it's hard to parse.

I thought this would be something that is commonly done all the time. I can write the code but I am slow and don't want to reinvent the wheel. I want all cancer types, with clinical variables and rna-seq RSEM data into an RObject for each type. Oh, and I want a way to toggle whether or not the sample is "normal". I think I can pull this from the clinical file.

Any help or pointers would be great!


rna-seq R • 1.1k views
ADD COMMENTlink modified 2.6 years ago by vinvan50 • written 2.6 years ago by hood8210
gravatar for vinvan
2.6 years ago by
vinvan50 wrote:

There are quite a few R packages out there that do exactly this. You can check TCGABiolinks or TCGA2STAT.

ADD COMMENTlink written 2.6 years ago by vinvan50

Sure, I have seen these. My concern is what happens in processing. Is there normalization? Are there samples dropped, if so , why? I want the data with as little manipulation as possible.

ADD REPLYlink written 2.6 years ago by hood8210
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1182 users visited in the last hour