Question: db-GaP / GTEx analysis on the cloud. Easiest way?
gravatar for filippo.martignano
10 months ago by
filippo.martignano20 wrote:


I would like to analyze db-GaP data (in particular GTEx data); since i need to analyse WGS bams (600 samples), the total size would be overwhelming, so I wanted to do the analysis on the cloud.

Can anyone suggest me a platform that allows me to both use my own tools/pipeline and to get the files without downloading them?

I'm basically looking for something like Cancer Genomic Cloud (CGC, by sevenbridges).

I've already seen a couple of platform, from Google cloud/amazon AWS to DnaNexus but I don't have experience with them, and I cannot understand if it is possible to directly analyse the files, or you need to download them first.

Thank you very much in advance!

google gtex amazon cloud db-gap • 479 views
ADD COMMENTlink modified 10 months ago by i.sudbery9.2k • written 10 months ago by filippo.martignano20

in particular GTEx data

Be ready to pay for access to that data if you are not going to use it on google cloud. Refer to this page for access instructions.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax90k
gravatar for i.sudbery
10 months ago by
Sheffield, UK
i.sudbery9.2k wrote:

The NCBI would like you to use the AnVIL service, which is hosted on the Google Cloud (US region).

The data itself is stored in a US region Google Cloud Bucket. Using it within US region GC is free, but data egress outside google or to another google region incurs a charge. About 8 cents a GB I seem to remember.

ADD COMMENTlink written 10 months ago by i.sudbery9.2k

Another cool thing about working with Google cloud and AnVil/Terra is that there maybe some already existing workflows workflows that conforms with GATK best practices alongside custom notebooks to facilitate your analysis!

ADD REPLYlink written 10 months ago by ekwame60

I'm not so sure how "cool" it is. NCBI just added $15,000 to the cost of my research project.

ADD REPLYlink written 10 months ago by i.sudbery9.2k

Thanks! Yep I knew about AnVIL, but (probably I'm doing it wrong) it seems to allow you only to use the "standard workflows" they provide, and not to use a custom one. There is a link to dockerstore for "additional workflow" but I cannot find a way to import one in my workspace. Also, I'm interested in WGS bams rather than RNA-seq, and unfortunately there seem to be only WGS data for grch38, while I need the hg19 version (which is still available through db-gap)

ADD REPLYlink written 10 months ago by filippo.martignano20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1287 users visited in the last hour