extract TCGA data
1
0
Entering edit mode
4.5 years ago
Learner ▴ 250

I am searching for an alternative to extract data from TCGA. I know this one http://www.linkedomics.org/admin.php but there was another web which was very easy but I don't remember. Any thought ?

genomics • 2.3k views
0
Entering edit mode

LinkedOmics to extract TCGA data? LinkedOmics is an analysis tool, and is quite removed from TCGA's raw data. (I'm part of the lab that developed LinkedOmics)

0
Entering edit mode

Here you go TCGA Assembler is another one apart from TCGAbiolinks and recount. Just remember all these will only provide you with matrix of count data or FPKM/TPM . This means they are quantified with RSEM. You cannot obtain raw data unless you specifically apply for GDC and ICGC approval and gain the access. Hope this helps.!

1
Entering edit mode

Dear vchris,

just a small comment : actually, not all these resourses/tools/projects are quantified with RSEM. This is "mostly TRUE" for "legend" TCGA data in the old repository (level 3 or 4)-for example, the harmonized versions with GDC:

https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/#mrna-expression-workflow

which can be accessed from TCGAbiolinks, can produce raw HTSeq gene counts, etc.

Also you can query raw sequencing data:

#### example

query <- GDCquery(project = c("TCGA-BRCA"),
data.category = "Raw Sequencing Data",
sample.type = "Primary solid Tumor")

1
Entering edit mode

Right, my apologies, I should have clarified it better. Mostly the legendary data with RSEM while one can also access STAR aligned data and retrieve counts via HT-Seq and normalized data as well. But to access BAMs (qualifies for raw data since you can go back to prepared the fastq files from them or raw fastq files) you need the higher access. Just to clarify the Raw sequencing in GDCquery if I am not wrong will not be able to download the raw legacy data unless one have the token file and has access to the controlled data. Check this link

P.S: I have spent days and nights to find a way to download raw data without access control but later got the access with tokens, and still its not very straight forward post access. ;)

1
Entering edit mode
4.5 years ago
svlachavas ▴ 760

Hi,

there are various repositories and R packages for accessing and downloading TCGA data. For example, take a look at the TCGAbiolinks R package, with various options, including both raw data as also processed, harmonized etc: http://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/query.html

Also other projects, such as the recount2 source:

https://jhubiostatistics.shinyapps.io/recount/

But the most important question is actually what is your biological question of interest ? and what would you like to search ?

Best,

Efstathios-Iason