Question: Submit High-Throughput data to GEO. Problems with FTP connection
8 months ago
mgdrnl10 wrote:

I open this question because I didn't find much information about this topic so far. I am trying to upload data from a RNA-seq project to GEO (387,5GB) with UNIX command line and I am getting the error:

 Lost data connection to remote host after 1xxxxxxxx bytes had been sent: Broken pipe.

The number of bytes being variable each time. After asking the IT service in my institution, they told me that the FTP protocol is very slow and the broken connection is expected for such big files.

I solved The issue using the scipt in this post

However, it will be helpful if anybody can share other answers to this problem, maybe also to improve the speed, as it is taking a lot of time to submit all files.

Thanks a lot!

That does not sound like a bioinformatics question to me!

Submit your data to ArrayExpress, it has a better interface for metadata management and file uploading. The direct FTP connection is also fast, accession ids are provided within a couple of hours and with a week they will provide the reviewer account details.

Thanks! I will definitely try ArrayExpress next time.

8 months ago
Bergen, Norway
Michael Dondrup wrote:

Are you sure you want to submit RNA-seq data (raw data?) to GEO? You should submit to SRA instead. NCBI supports upload via Aspera connect (ascp with very similar command line interface to scp) which is faster and more robust against interrupted network connection. See : and

Thank you very much for the recommendation! the truth is I didn't know about SRA. Unfortunately the journal I am submitting the paper asks me to deposit the data (raw data, fastq files) to GEO or ArrayExpress. I will try with ArrayExpress next time, as someone above said it is better.. unless the journal changes their policy.

