How do I use md5sum to check if my .fastq files from SRA Explorer have been downloaded correctly
1
0
Entering edit mode
10 weeks ago
biotrekker ▴ 100

I am having issues downloading single cell RNA-seq fastq files from SRA Explorer. Files are corrupted and sometimes half the size. I have 200+ files so it becomes hard to check which ones need to be redownloaded. How can I check using md5sum that the files I downloaded are correct. I am using a virtual machine to download this massive amount of data

Thank you

scRNA-seq md5 • 517 views
ADD COMMENT
0
Entering edit mode

You may need to use sratoolkit prefetch and then check using vdb-validate before dumping the data out. Ideally you would get the original BAM files (if submitted by submitters) and then use the 10x util (bamtofastq) locally.

Single cell data is all over the place in SRA and unfortunately sra-explorer does not help with that.

ADD REPLY
2
Entering edit mode
10 weeks ago
ATpoint 81k

As far as download goes, I don't think that sra-explorer has any option for md5sums. Here is two ad-hoc options:

1) Use touch: wget (options...) <file> && touch file.done

That will only create an empty file called file.done (or any name you give it in your script/loop) if download finished and wget did not throw an error. I think if connection breaks then this should not be created. So presence of the file indicates proper download.

2) Use Aspera, which sra-explorer provides download links for.

The tool is clever enough to download data as a tmp/hidden file and only create the final visible file if download completed successfully. Presence of final file means download was ok. See Setting up Aspera Connect (ascp) on Linux and macOS

ADD COMMENT
0
Entering edit mode

Say I have 300 samples I am downloading from sra-explorer, do I add "&& touch file.done" at the end of each of the 300 curl commands?

ADD REPLY
0
Entering edit mode

Additionally I am having trouble setting ASCP virtually, is there a different way to set up aspera connect virtually?

Thanks

ADD REPLY
0
Entering edit mode

No, it would be created per file but I really encourage to use Aspera. Much easier and safer.

ADD REPLY

Login before adding your answer.

Traffic: 2598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6