Tutorial: use firehose_get to download TCGA data
2
gravatar for jmzeng1314
13 months ago by
jmzeng131490
jmzeng131490 wrote:

it's extremly easy to use firehose_get to download TCGA data

About firehose

This website is maintained by Broad Institute, the url is : http://firebrowse.org/ , and all of the TCGA data are stored in below urls :

https://gdac.broadinstitute.org/runs/stddata__latest/

https://gdac.broadinstitute.org/runs/analyses__latest/

Obevisely, you can go to these two urls directly to find what you want, but it's a little hard for the majority of us to do it. That's why they create a tool to help us to explore and download the TCGA data.

Which is firehose_get ,https://confluence.broadinstitute.org/display/GDAC/Download

firehose_get

Firsly, you should download and install that tool, as below:

cd ~/biosoft
mkdir firehose && cd firehose 
wget http://gdac.broadinstitute.org/runs/code/firehose_get_latest.zip
unzip firehose_get_latest.zip 
~/biosoft/firehose/firehose_get
~/biosoft/firehose/firehose_get -tasks clinical analyses latest brca

As you can see, you should run this scripts in terminal, there are 4 parameters for firehose_get, which are :

  • -tasks, to determine what kind of data to download, such as Clinical CN LowP Methylation mRNA mRNASeq miR miRSeq RPPA MAF rawMAF
  • analyses or data, you have to choose one of them, to tell the tool, level 4 or level 3 data you need.
  • latest or other date, you should definitely choose latest, unless you want to follow a old paper.
  • Tumor type: ACC BLCA BRCA CESC COAD COADREAD DLBC ESCA GBM HNSC KICH KIRC KIRP LAML LGG LIHC LUAD LUSC OV PAAD PANCANCER PANCAN8 PANCAN12 PRAD READ SARC SKCM STAD THCA UCEC UCS

Examples:

  1. download all of the expression data (level 3 ) for BRCA : ~/biosoft/firehose/firehose_get -tasks rna data latest brca

  2. download all of the analysis results by using the rna data for BRCA: ~/biosoft/firehose/firehose_get -tasks rna analyses latest brca

Is it ver easy ?

However, to understand the data you download is not a easy thing.

Have fun with it ~~~

others

In fact, this tool is really simple, just to help you to download the data from their website, cause there are really too many data in it.

https://gdac.broadinstitute.org/runs/stddata__2016_07_15/

https://gdac.broadinstitute.org/runs/stddata__latest/

https://gdac.broadinstitute.org/runs/analyses__2016_01_28/

So, you can skip firehose_get, juse using wget as below :

wget -c -r -np -nH -k -L -p -A "*snp_6*hg19*" http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/BRCA/20160128/
## the same as below: 
./firehose_get -tasks snp_6 stddata latest brca
rna-seq tcga tutorial • 1.4k views
ADD COMMENTlink written 13 months ago by jmzeng131490

tiny tip: One could use

echo 'export PATH=~/your/own/path/to/the/folder/contain/firehose_get:$PATH' >> ~/.bashrc
source ~/.bashrc

to add firehose_get into PATH variable, then only type firehose_get is enough to execute.

NB: the PATH must end with the nearest folder name, not including firehose_get.

ADD REPLYlink modified 4 weeks ago • written 12 months ago by Wenhu_Cao50

Thank you for this tip.

Actually I don't like to add the software in the environment, which always mislead the newcomer to bioinformatics.

ADD REPLYlink written 12 months ago by jmzeng131490

hi,

is there a way I can get more details about the -tasks? e.g. what is available? what the abbreviation are standing for? etc..

thanks

ADD REPLYlink modified 6 months ago • written 6 months ago by H.Hasani630

That info is in the original post.

-tasks, to determine what kind of data to download, such as Clinical CN LowP Methylation mRNA mRNASeq miR miRSeq RPPA MAF rawMAF
ADD REPLYlink written 6 months ago by genomax59k

I'm not sure if this is the full list; there is also gistic which makes me wonder whatelse is missing. Moreover, it seems that some of the keywords were depricated.

ADD REPLYlink modified 6 months ago • written 6 months ago by H.Hasani630
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1050 users visited in the last hour