Question: db-GaP / GTEx analysis on the cloud. Easiest way?
gravatar for filippo.martignano
22 days ago by
filippo.martignano10 wrote:


I would like to analyze db-GaP data (in particular GTEx data); since i need to analyse WGS bams (600 samples), the total size would be overwhelming, so I wanted to do the analysis on the cloud.

Can anyone suggest me a platform that allows me to both use my own tools/pipeline and to get the files without downloading them?

I'm basically looking for something like Cancer Genomic Cloud (CGC, by sevenbridges).

I've already seen a couple of platform, from Google cloud/amazon AWS to DnaNexus but I don't have experience with them, and I cannot understand if it is possible to directly analyse the files, or you need to download them first.

Thank you very much in advance!

google gtex amazon cloud db-gap • 104 views
ADD COMMENTlink modified 22 days ago by i.sudbery6.3k • written 22 days ago by filippo.martignano10

in particular GTEx data

Be ready to pay for access to that data if you are not going to use it on google cloud. Refer to this page for access instructions.

ADD REPLYlink modified 22 days ago • written 22 days ago by genomax75k
gravatar for i.sudbery
22 days ago by
Sheffield, UK
i.sudbery6.3k wrote:

The NCBI would like you to use the AnVIL service, which is hosted on the Google Cloud (US region).

The data itself is stored in a US region Google Cloud Bucket. Using it within US region GC is free, but data egress outside google or to another google region incurs a charge. About 8 cents a GB I seem to remember.

ADD COMMENTlink written 22 days ago by i.sudbery6.3k

Another cool thing about working with Google cloud and AnVil/Terra is that there maybe some already existing workflows workflows that conforms with GATK best practices alongside custom notebooks to facilitate your analysis!

ADD REPLYlink written 21 days ago by ekwame00110

I'm not so sure how "cool" it is. NCBI just added $15,000 to the cost of my research project.

ADD REPLYlink written 21 days ago by i.sudbery6.3k

Thanks! Yep I knew about AnVIL, but (probably I'm doing it wrong) it seems to allow you only to use the "standard workflows" they provide, and not to use a custom one. There is a link to dockerstore for "additional workflow" but I cannot find a way to import one in my workspace. Also, I'm interested in WGS bams rather than RNA-seq, and unfortunately there seem to be only WGS data for grch38, while I need the hg19 version (which is still available through db-gap)

ADD REPLYlink written 21 days ago by filippo.martignano10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1824 users visited in the last hour