Question: db-GaP / GTEx analysis on the cloud. Easiest way?
I would like to analyze db-GaP data (in particular GTEx data); since i need to analyse WGS bams (600 samples), the total size would be overwhelming, so I wanted to do the analysis on the cloud.

Can anyone suggest me a platform that allows me to both use my own tools/pipeline and to get the files without downloading them?

I'm basically looking for something like Cancer Genomic Cloud (CGC, by sevenbridges).

I've already seen a couple of platform, from Google cloud/amazon AWS to DnaNexus but I don't have experience with them, and I cannot understand if it is possible to directly analyse the files, or you need to download them first.

Thank you very much in advance!

in particular GTEx data

Be ready to pay for access to that data if you are not going to use it on google cloud. Refer to this page for access instructions.

Sheffield, UK
The NCBI would like you to use the AnVIL service, which is hosted on the Google Cloud (US region).

The data itself is stored in a US region Google Cloud Bucket. Using it within US region GC is free, but data egress outside google or to another google region incurs a charge. About 8 cents a GB I seem to remember.

Another cool thing about working with Google cloud and AnVil/Terra is that there maybe some already existing workflows workflows that conforms with GATK best practices alongside custom notebooks to facilitate your analysis!

I'm not so sure how "cool" it is. NCBI just added $15,000 to the cost of my research project.

Thanks! Yep I knew about AnVIL, but (probably I'm doing it wrong) it seems to allow you only to use the "standard workflows" they provide, and not to use a custom one. There is a link to dockerstore for "additional workflow" but I cannot find a way to import one in my workspace. Also, I'm interested in WGS bams rather than RNA-seq, and unfortunately there seem to be only WGS data for grch38, while I need the hg19 version (which is still available through db-gap)

