db-GaP / GTEx analysis on the cloud. Easiest way?
1
0
Entering edit mode
4.4 years ago

Hi!

I would like to analyze db-GaP data (in particular GTEx data); since i need to analyse WGS bams (600 samples), the total size would be overwhelming, so I wanted to do the analysis on the cloud.

Can anyone suggest me a platform that allows me to both use my own tools/pipeline and to get the files without downloading them?

I'm basically looking for something like Cancer Genomic Cloud (CGC, by sevenbridges).

I've already seen a couple of platform, from Google cloud/amazon AWS to DnaNexus but I don't have experience with them, and I cannot understand if it is possible to directly analyse the files, or you need to download them first.

Thank you very much in advance!

db-gap GTEx cloud amazon google • 2.0k views
ADD COMMENT
0
Entering edit mode

in particular GTEx data

Be ready to pay for access to that data if you are not going to use it on google cloud. Refer to this page for access instructions.

ADD REPLY
1
Entering edit mode
4.4 years ago

The NCBI would like you to use the AnVIL service, which is hosted on the Google Cloud (US region).

The data itself is stored in a US region Google Cloud Bucket. Using it within US region GC is free, but data egress outside google or to another google region incurs a charge. About 8 cents a GB I seem to remember.

ADD COMMENT
0
Entering edit mode

Another cool thing about working with Google cloud and AnVil/Terra is that there maybe some already existing workflows workflows that conforms with GATK best practices alongside custom notebooks to facilitate your analysis!

https://anvil.terra.bio/#library/showcase

ADD REPLY
0
Entering edit mode

I'm not so sure how "cool" it is. NCBI just added $15,000 to the cost of my research project.

ADD REPLY
0
Entering edit mode

Thanks! Yep I knew about AnVIL, but (probably I'm doing it wrong) it seems to allow you only to use the "standard workflows" they provide, and not to use a custom one. There is a link to dockerstore for "additional workflow" but I cannot find a way to import one in my workspace. Also, I'm interested in WGS bams rather than RNA-seq, and unfortunately there seem to be only WGS data for grch38, while I need the hg19 version (which is still available through db-gap)

ADD REPLY

Login before adding your answer.

Traffic: 3458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6