Question

How to download GTEx V7 raw RNA-seq data which is possibly deposited in gs.US s3.us-east-1 instead of sra?

1

Entering edit mode

6.3 years ago

haersliu ▴ 10

Hello! I have the authority (phs000424.v7.p2) to download GTEx V7 raw data. However, I couldn't download any data deposited in "gs s3 ". And these data were released in 2018. I have tried dowload toolkit "prefetch" and sam-dump. "prefetch" have nothing to do while sam-dump report errors "error: item not found while reading file - input object(s) not found". I'm sure I have configured the dbGaP repository key and can download GTEx data deposited in sra. Thank you!

RNA-Seq GTEx • 5.3k views

ADD COMMENT • link updated 3.1 years ago by Claire Malley ▴ 40 • written 6.3 years ago by haersliu ▴ 10

0

Entering edit mode

"prefetch" have nothing to do

What does that mean? What are the commands you use? If I remember correctly you have to be in the directory specifyied in vdb-config for the restriced data to download them.

ADD REPLY • link 6.3 years ago by ATpoint 88k

0

Entering edit mode

I have successfully configured vdb-config and can download other GTEx raw data released before 2018 by prefetch. The command example .\prefetch --ascp-path "C:\Users\haers\AppData\Local\Programs\Aspera\Aspera Connect\bin\ascp.exe|C:\Users\haers\AppData\Local\Programs\Aspera\Aspera Connect\etc\asperaweb_id_dsa.openssh" SRR8220239 It just wait several seconds and end this command.

ADD REPLY • link 6.3 years ago by haersliu ▴ 10

0

Entering edit mode

I came up with same error when download this data. Did you find out how to solve this or how to downlaod data from s3 or gs? And here is the detail of my error,hope this can help.

Downloading kart file 'cart_DAR78310_201904012229.krt' Checking sizes of kart files...

2019-04-02T04:03:01 prefetch.2.9.0: KClientHttpOpen - connected from '192.168.3.3' to www.ncbi.nlm.nih.gov (130.14.29.110) 2019-04-02T04:03:02 prefetch.2.9.0: KClientHttpOpen - verifying CA cert 2019-04-02T04:03:02 prefetch.2.9.0: KClientHttpOpen - connected from '192.168.3.3' to www.ncbi.nlm.nih.gov (130.14.29.110) 2019-04-02T04:03:03 prefetch.2.9.0: KClientHttpOpen - verifying CA cert 2019-04-02T04:03:03 prefetch.2.9.0 err: path not found while resolving tree within virtual file system module - 'SRR8218455' cannot be found. 2019-04-02T04:03:03 prefetch.2.9.0: Resolve(SRR8218455) = RC(rcVFS,rcTree,rcResolving,rcPath,rcNotFound): 2019-04-02T04:03:03 prefetch.2.9.0: local(NULL) 2019-04-02T04:03:03 prefetch.2.9.0: cache(NULL) 2019-04-02T04:03:03 prefetch.2.9.0: remote(NULL:0)

ADD REPLY • link 6.3 years ago by walker ▴ 30

0

Entering edit mode

Are you in the folder specified in vdb-config for restricted data when starting the command?

ADD REPLY • link 6.3 years ago by ATpoint 88k

0

Entering edit mode

Hello! I'm having the same issue here, and thus, loosing more than half of the samples. Were you able to solve this issue?

Thanks!

ADD REPLY • link 6.1 years ago by jean.christophe.grenier ▴ 30

score 2 · Answer 1 · 2019-05-28

2

Entering edit mode

6.1 years ago

Claire Malley ▴ 40

If you are looking for the GTEx raw sequencing data (CRAM files), the most recent GTEx submissions were recently moved to the cloud, on AWS US N. Virginia East (US-amazon-east 1), according to emails I've had with the SRA toolkit devs at NCBI. They make you use fusera and sracp tools to copy the data. Let me know if you need the documentation and instructions, since I just figured it out.

If it's the phenotype data or processed sequencing data (i.e. normalized RNASeq tables), that is still available through the sratoolkit outside of the cloud. SRA staff are pretty responsive for troubleshooting the toolkit: sra@ncbi.nlm.nih.gov.

ADD COMMENT • link 3.1 years ago by Claire Malley ▴ 40

0

Entering edit mode

Does that mean one has to open a cloud account and pay for the service (downloading and/or processing using their CPUs and storage)?

ADD REPLY • link 6.1 years ago by jianxinwang24 • 0

0

Entering edit mode

Hello Claire,

Some instructions would be really appreciated! We are also having some issues with almost half of the samples from GTex on the typical dbGAP system. Thanks a lot!

ADD REPLY • link 6.0 years ago by jean.christophe.grenier ▴ 30

0

Entering edit mode

Hi jianxinwang24 and jean.christophe.grenier, apologies for late reply. Yes, as far as I know, you need to have an AWS account to access the GTEx samples they moved to AWS. They provide a test ec2 to practice mounting. In my case I made a large ec2 instance to transfer the data I needed. I tried attaching a personal s3 storage to the test instance but transfer did not work, and more importantly I am not sure if it is private. I configured my ec2 for enough on-board storage (16 TB). The transfer outside of US-amazon-east 1 will cost money but I believe within-region data transfer is free. Like I said, you must use fusera and sracp to mount and copy the data. I tried rsync and failed.

A note about the CRAM format: The original hg19 fasta file they used is on the GTEx data page. I am part-way through CRAM -> BAM -> FASTA for realignment to hg38.

ADD REPLY • link 3.1 years ago by Claire Malley ▴ 40