Entering edit mode
3 months ago
bioinformatics ▴ 10
I'm trying to download raw fastq files that were generated by scRNAseq from SRA in NCBI.
I'm using terminal on a Mac
I have run the following commands:
admins-MacBook-Air:~ mesalie$ cd desktop admins-MacBook-Air:desktop mesalie$ mkdir tmp mkdir: tmp: File exists admins-MacBook-Air:desktop mesalie$ tar -xvzf sratoolkit.3.0.2-mac64.tar admins-MacBook-Air:desktop mesalie$ ls admins-MacBook-Air:desktop mesalie$ cd sratoolkit.3.0.2-mac64 admins-MacBook-Air:sratoolkit.3.0.2-mac64 mesalie$ ls admins-MacBook-Air:sratoolkit.3.0.2-mac64 mesalie$ cd bin/ admins-MacBook-Air:bin mesalie$ ./vdb-config - I
However for the last command I get the following error message:
dyld: lazy symbol binding failed: Symbol not found: ____chkstk_darwin Referenced from: /Users/mesalie/Desktop/sratoolkit.3.0.2-mac64/bin/./vdb-config (which was built for Mac OS X 10.15) Expected in: /usr/lib/libSystem.B.dylib dyld: Symbol not found: ____chkstk_darwin Referenced from: /Users/mesalie/Desktop/sratoolkit.3.0.2-mac64/bin/./vdb-config (which was built for Mac OS X 10.15) Expected in: /usr/lib/libSystem.B.dylib
Does anyone know how I might correct this?
It's probably an Intel binary and your Macbook is M1 (ARM) based? Give it a try installing via conda (https://anaconda.org/bioconda/sra-tools) or use sra-explorer.info to get direct download links for your datasets of interest.
Ok I will check. The datasets I’m trying to analyse are not listed on sra-explorer.info
sra-explorer queries ncbi, so if it is not there it is likely not an NCBI dataset. Which one is it?
From that link, the SRA identifier is
SRP295404. Type that into sra-explorer. Add everything to cart and you'll see all the FASTQ FTP download URLs.
Alternatively, run the following on command line to get the FTP download URL urls:
Thanks. How might I find the SRA identifier from the link?
You mean from the GEO page you link? It's below the Sample information section. You can also search sra-explorer for that BioProject (PRJNA...) number.
That having said, unless you really want to process fastq files from scratch, you can just take the processed data under
Supplementary fileand start from there. That is most likely the CellRanger output containing raw counts per cell for every sample. That saves you a great deal of work. If not, so if you download fastq, the authors uploaded three files (example https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR13177101&display=metadata), the 8bp one is the read index (not needed, can discard) the second 28bp one is UMI+barcodes and the third one is the gene expression. You need the 2nd and third for (forrecponding to R1 and R2 from the sequencer) for preprocessing, be it CellRanger STARsolo or other approaches such as salmon-alevin or kallisto-bustools. As said, use the preprocessed data they provide, unless there is a good reason not to. On a MacBook air you do not want to do any preprocessing anyway, even with the most lightweight tools that is going to be painful.
If possible, find a download URL for the FASTQ files (e.g. with sra-explorer or ffq). The sra toolkit is not really well-designed and I personally avoid using it.