Question: Downloading paired end fastq from SRA
gravatar for t.t
12 weeks ago by
t.t10 wrote:

Hi everyone,

I would really like to download the raw data of a specific public single-cell RNA-Seq experiment (ENA, GEO). As the BCL files do not seem to be available the most "raw" format would probably be paired end fastq files. Currently I am unable to download the files in a split way and I would really appreciate your help.

For simplicity just focus on one sample: Donor1_scRNA-seq_rep1 (GSM3052917, Experiment: SRX3815586, Run: SRR6860519)

I already tried fastq-dump and fasterq-dump with all possible split parameters (--split-files etc.) but despite of the parameter I just receive one fastq file.

fastq-dump --split-files SRR6860519
fasterq-dump -S SRR6860519

The library type is definitely paired and at ENA one can see two submitted MD5-sums per sample.

Does anyone know how to split these samples correctly? And does it make a difference if I provide the experiment accession or the run accession to fastq-dump/fasterq-dump?

Thanks in advance!

rna-seq sra singlecell • 234 views
ADD COMMENTlink modified 12 weeks ago by ATpoint19k • written 12 weeks ago by t.t10

Although the sample was described as paired-end, I am sure the sample only contains one read, and there was a note - "This run has 1 read per spot", please click here:

ADD REPLYlink written 12 weeks ago by zhangdengwei30

Yes it does. Not the first time there is something missing on NCBI. Contacting the authors is probably your best choice.

ADD REPLYlink written 12 weeks ago by ATpoint19k

I think the authors only uploaded the R2 fastq files, and not the R1 file containing the UMI sequence. Here you can read in Extraction protocol and Data processing that R1 is 26 nt and R2 is 100 nt long. If you look in the fastq file, you see only 100 (101) nt long reads. If you want the UMI as well, I am afraid you'll have to ask the authors (as ATpoint is suggesting).

ADD REPLYlink written 12 weeks ago by Benn7.1k

Thanks for pointing that out.

What I am still curious about are the two MD5 checksums that are available per sample (at ENA). Wouldn't that mean that the authors indeed uploaded two files per sample?

Edit: Found the answer myself for the two checksums: At ENA there were two files submitted per sample: A BAM-file and an related index (.BAI).

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by t.t10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1534 users visited in the last hour