Entering edit mode
4.9 years ago
Farah
▴
80
Hello,
I need to download some RNA-seq fastq.gz files from both GEO and SRA databases. May I know how can I download these datasets from them?
Thank you very much.
Best wishes
Thank you very much for the useful tutorial link. I followed the tutorial steps to download (GSE111653 dataset with BioSample accession number of PRJNA437670). First, I downloaded tarball file from Aspera client and then I ran tar zxvf /scratch/user/ye/ibm-aspera-connect-3.9.5.172984-linux-g2.12-64.tar.gz on linux.
Then, after downloading PRJNA437670.txt file from ENA, I ran the below command: $ awk 'FS="\t", OFS="\t" { gsub("ftp.sra.ebi.ac.uk", "era-fasp@fasp.sra.ebi.ac.uk:"); print }' /scratch/user/ye/PRJNA437670.txt | cut -f3 | awk -F ";" 'OFS="\n" {print $1, $2}' | awk NF | awk 'NR > 1, OFS="\n" {print "ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh" " " $1 " ."}' > download.txt
So, now, I have only 4 files in my /scratch/user/ye/ directory as follows:
download.txt ibm-aspera-connect-3.9.5.172984-linux-g2.12-64.sh ibm-aspera-connect-3.9.5.172984-linux-g2.12-64.tar.gz PRJNA437670.txt
I then ran the below command to download the data: $ cat /scratch/user/ye/download.txt | parallel "{}"
However, I faced with the following ERROR:
Academic tradition requires you to cite works you base your article on. When using programs that use GNU Parallel to process data for publication please cite: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT. If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence the citation notice: run 'parallel --citation'. Can't exec "/bin/sh": Argument list too long at /local/software/biobuilds/2017.11/bin/parallel line 3981. . . Can't exec "/bin/sh": Argument list too long at /local/software/biobuilds/2017.11/bin/parallel line 3981. /bin/bash: ascp: command not found /bin/bash: ascp: command not found . . /bin/bash: ascp: command not found Use of uninitialized value $opt::termseq in split at /local/software/biobuilds/2017.11/bin/parallel line 3608, <stdin> line 128.
Also, I tried:
$ while read LIST; do $LIST; done < /scratch/user/ye/download.txt
And I got many -bash: ascp: command not found messages
Would you please help me what I did wrong and how to fix it? Thank you very much.