Question: TCGA data on cloud
2.2 years ago
United States
Dear group, I apologize if I am asking a question that has been discussed here. I could not find a related search to my question.

I am interested in running analyses on TCGA raw data on cloud that hosts raw data already. I have necessary permissions to access raw TCGA data. What I am looking into is 1. access raw TCGA FASTQ files from multiple organ sites, align to a genome of my selection. 2. Using BAM files, I analyze data to obtain results that are required for further downstream analyses. 3. Cloud should provide tools (STAR, BWA, Tophat etc. ) I don't want to install and configure. Looking for analysis ready cloud space.

Are there available. I read about sevenbridges, Broad, NCI-GDC cloud and google-SIB etc. It is confusing to know which service would match based on my requirements.

Appreciate your input, suggestions.



2.2 years ago
United States
So the NCI-GDC is the official repository of the TCGA data. They have raw and processed data. They also run best practices pipelines on the raw data, and provide processed data. Check this site out first: If the pipeline you want to run is the usual processing pipeline for RNA-seq or whole-exome or whole-genome data, the processed files (VCF or MAF) may already be provided in the GDC platform.

If you don't find what you need in the link above, then you need to process your own special pipeline on the raw data, you will need one the following platforms: Seven Bridges CGC, Broad, and Google ISB. From what I know so far, each platform has its own strengths and capabilities. And your preference for the platform is based on your requirements and expertise with command line bioinformatics tools.

I hope this helps.

