How to download GTEx V7 raw RNA-seq data which is possibly deposited in gs.US s3.us-east-1 instead of sra?
1
1
Entering edit mode
2.1 years ago
haersliu ▴ 10

RNA-Seq GTEx • 2.3k views
0
Entering edit mode
"prefetch" have nothing to do


What does that mean? What are the commands you use? If I remember correctly you have to be in the directory specifyied in vdb-config for the restriced data to download them.

0
Entering edit mode

I have successfully configured vdb-config and can download other GTEx raw data released before 2018 by prefetch. The command example .\prefetch --ascp-path "C:\Users\haers\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe|C:\Users\haers\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh" SRR8220239 It just wait several seconds and end this command.

0
Entering edit mode

I came up with same error when download this data. Did you find out how to solve this or how to downlaod data from s3 or gs? And here is the detail of my error,hope this can help.

2019-04-02T04:03:01 prefetch.2.9.0: KClientHttpOpen - connected from '192.168.3.3' to www.ncbi.nlm.nih.gov (130.14.29.110) 2019-04-02T04:03:02 prefetch.2.9.0: KClientHttpOpen - verifying CA cert 2019-04-02T04:03:02 prefetch.2.9.0: KClientHttpOpen - connected from '192.168.3.3' to www.ncbi.nlm.nih.gov (130.14.29.110) 2019-04-02T04:03:03 prefetch.2.9.0: KClientHttpOpen - verifying CA cert 2019-04-02T04:03:03 prefetch.2.9.0 err: path not found while resolving tree within virtual file system module - 'SRR8218455' cannot be found. 2019-04-02T04:03:03 prefetch.2.9.0: Resolve(SRR8218455) = RC(rcVFS,rcTree,rcResolving,rcPath,rcNotFound): 2019-04-02T04:03:03 prefetch.2.9.0: local(NULL) 2019-04-02T04:03:03 prefetch.2.9.0: cache(NULL) 2019-04-02T04:03:03 prefetch.2.9.0: remote(NULL:0)

0
Entering edit mode

Are you in the folder specified in vdb-config for restricted data when starting the command?

0
Entering edit mode

Hello! I'm having the same issue here, and thus, loosing more than half of the samples. Were you able to solve this issue?

Thanks!

2
Entering edit mode
23 months ago

If you are looking for the GTEx raw sequencing data (CRAM files), the most recent GTEx submissions were recently moved to the cloud, on AWS US N. Virginia East (US-amazon-east 1), according to emails I've had with the SRA toolkit devs at NCBI. They make you use fusera and sracp tools to copy the data. Let me know if you need the documentation and instructions, since I just figured it out.

If it's the phenotype data or processed sequencing data (i.e. normalized RNASeq tables), that is still available through the sratoolkit outside of the cloud. SRA staff are pretty responsive for troubleshooting the toolkit: sra@ncbi.nlm.nih.gov.

EDIT 9/27/19: I have gotten many emails asking for detailed steps, so I wrote a post here explaining all that I know. https://cemalley.com/2019/09/27/accessing-gtex-v7-v8-raw-rna-seq-data-in-the-cloud/

0
Entering edit mode

Does that mean one has to open a cloud account and pay for the service (downloading and/or processing using their CPUs and storage)?

0
Entering edit mode

Hello Claire,

Some instructions would be really appreciated! We are also having some issues with almost half of the samples from GTex on the typical dbGAP system. Thanks a lot!

0
Entering edit mode

Hi jianxinwang24 and jean.christophe.grenier, apologies for late reply. Yes, as far as I know, you need to have an AWS account to access the GTEx samples they moved to AWS. They provide a test ec2 to practice mounting. In my case I made a large ec2 instance to transfer the data I needed. I tried attaching a personal s3 storage to the test instance but transfer did not work, and more importantly I am not sure if it is private. I configured my ec2 for enough on-board storage (16 TB). The transfer outside of US-amazon-east 1 will cost money but I believe within-region data transfer is free. Like I said, you must use fusera and sracp to mount and copy the data. I tried rsync and failed.

A note about the CRAM format: The original hg19 fasta file they used is on the GTEx data page. I am part-way through CRAM -> BAM -> FASTA for realignment to hg38.

Feel free to email me at malleyce at nih dot gov for more help. The whole situation is unfortunate with very poor communication from dbGaP. But SRA staff responded.