Question: How to download GTEx V7 raw RNA-seq data which is possibly deposited in gs.US s3.us-east-1 instead of sra?
1
gravatar for haersliu
7 months ago by
haersliu10
haersliu10 wrote:

Hello! I have the authority (phs000424.v7.p2) to download GTEx V7 raw data. However, I couldn't download any data deposited in "gs s3 ". And these data were released in 2018. I have tried dowload toolkit "prefetch" and sam-dump. "prefetch" have nothing to do while sam-dump report errors "error: item not found while reading file - input object(s) not found". I'm sure I have configured the dbGaP repository key and can download GTEx data deposited in sra. Thank you!

gtex rna-seq • 901 views
ADD COMMENTlink modified 5 months ago by Claire Malley40 • written 7 months ago by haersliu10
"prefetch" have nothing to do

What does that mean? What are the commands you use? If I remember correctly you have to be in the directory specifyied in vdb-config for the restriced data to download them.

ADD REPLYlink written 7 months ago by ATpoint25k

I have successfully configured vdb-config and can download other GTEx raw data released before 2018 by prefetch. The command example .\prefetch --ascp-path "C:\Users\haers\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe|C:\Users\haers\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh" SRR8220239 It just wait several seconds and end this command.

ADD REPLYlink modified 7 months ago • written 7 months ago by haersliu10

I came up with same error when download this data. Did you find out how to solve this or how to downlaod data from s3 or gs? And here is the detail of my error,hope this can help.

Downloading kart file 'cart_DAR78310_201904012229.krt' Checking sizes of kart files...

2019-04-02T04:03:01 prefetch.2.9.0: KClientHttpOpen - connected from '192.168.3.3' to www.ncbi.nlm.nih.gov (130.14.29.110) 2019-04-02T04:03:02 prefetch.2.9.0: KClientHttpOpen - verifying CA cert 2019-04-02T04:03:02 prefetch.2.9.0: KClientHttpOpen - connected from '192.168.3.3' to www.ncbi.nlm.nih.gov (130.14.29.110) 2019-04-02T04:03:03 prefetch.2.9.0: KClientHttpOpen - verifying CA cert 2019-04-02T04:03:03 prefetch.2.9.0 err: path not found while resolving tree within virtual file system module - 'SRR8218455' cannot be found. 2019-04-02T04:03:03 prefetch.2.9.0: Resolve(SRR8218455) = RC(rcVFS,rcTree,rcResolving,rcPath,rcNotFound): 2019-04-02T04:03:03 prefetch.2.9.0: local(NULL) 2019-04-02T04:03:03 prefetch.2.9.0: cache(NULL) 2019-04-02T04:03:03 prefetch.2.9.0: remote(NULL:0)

ADD REPLYlink written 7 months ago by walker20

Are you in the folder specified in vdb-config for restricted data when starting the command?

ADD REPLYlink written 7 months ago by ATpoint25k

Hello! I'm having the same issue here, and thus, loosing more than half of the samples. Were you able to solve this issue?

Thanks!

ADD REPLYlink written 5 months ago by jean.christophe.grenier0
2
gravatar for Claire Malley
5 months ago by
National Institutes of Health, Rockville, MD
Claire Malley40 wrote:

If you are looking for the GTEx raw sequencing data (CRAM files), the most recent GTEx submissions were recently moved to the cloud, on AWS US N. Virginia East (US-amazon-east 1), according to emails I've had with the SRA toolkit devs at NCBI. They make you use fusera and sracp tools to copy the data. Let me know if you need the documentation and instructions, since I just figured it out.

If it's the phenotype data or processed sequencing data (i.e. normalized RNASeq tables), that is still available through the sratoolkit outside of the cloud. SRA staff are pretty responsive for troubleshooting the toolkit: sra@ncbi.nlm.nih.gov.

EDIT 9/27/19: I have gotten many emails asking for detailed steps, so I wrote a post here explaining all that I know. https://cemalley.com/2019/09/27/accessing-gtex-v7-v8-raw-rna-seq-data-in-the-cloud/

ADD COMMENTlink modified 6 weeks ago • written 5 months ago by Claire Malley40

Does that mean one has to open a cloud account and pay for the service (downloading and/or processing using their CPUs and storage)?

ADD REPLYlink written 4 months ago by jianxinwang240

Hello Claire,

Some instructions would be really appreciated! We are also having some issues with almost half of the samples from GTex on the typical dbGAP system. Thanks a lot!

ADD REPLYlink written 4 months ago by jean.christophe.grenier0

Hi jianxinwang24 and jean.christophe.grenier, apologies for late reply. Yes, as far as I know, you need to have an AWS account to access the GTEx samples they moved to AWS. They provide a test ec2 to practice mounting. In my case I made a large ec2 instance to transfer the data I needed. I tried attaching a personal s3 storage to the test instance but transfer did not work, and more importantly I am not sure if it is private. I configured my ec2 for enough on-board storage (16 TB). The transfer outside of US-amazon-east 1 will cost money but I believe within-region data transfer is free. Like I said, you must use fusera and sracp to mount and copy the data. I tried rsync and failed.

A note about the CRAM format: The original hg19 fasta file they used is on the GTEx data page. I am part-way through CRAM -> BAM -> FASTA for realignment to hg38.

Feel free to email me at malleyce at nih dot gov for more help. The whole situation is unfortunate with very poor communication from dbGaP. But SRA staff responded.

ADD REPLYlink written 4 months ago by Claire Malley40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1771 users visited in the last hour