Why does submitting high-throughput sequence data to GEO from an Amazon EC2 instance produce the error "Could not read reply from control connection -- timed out" ?
0
1
Entering edit mode
3.9 years ago

I am following the instruction mentioned on Submit to GEO to upload about 83G of RNA-seq data in gzipped form to the GEO FTP server. I was first using the following command, but the connection had a time-out after every file:

ncftpput -B 33554432 -z -u 'username' -p 'password' -v -R \


I then extended this to the following script, such that it retries until all files are uploaded:

#!/bin/bash
cd /home/ec2-user

try=0
COMPLETE_CONDITION=0

echo "START"

until [ "$lastresult" = "$COMPLETE_CONDITION" ]; do
let "try+=1"
echo "Try $try ..." ncftpput -B 33554432 -z -u 'username' -p 'password' -v -R \ ftp-private.ncbi.nlm.nih.gov /fasp/ local_folder_to_upload let "lastresult=$?"
echo "Last Resultcode: $lastresult" done echo "UPLOAD COMPLETED AFTER$try TRY(S)"

exit 0


Which worked in principal and after several tries I got all samples uploading correctly on GEO. However the error message persisted:

Any thoughts on why this happens and how to resolve it? I does not look to be crucial as all files seem to be uploaded correctly.

next-gen geo ftp • 2.4k views
0
Entering edit mode

This might not really solve the problem but: Does geo have a way to get some hash like sha or md5? If the checksum is ok, I would not bother too much. I would just want to make sure the files are not truncated.

0
Entering edit mode

Thank you very much for opening the question and for the bash script!

I had the same problem when uploading RNA-seq data to GEO. The connection broke several times and it was hard to submit all the fastq files. However, with your script it is working fine for the moment and I am not getting the error you mention.