Question

SRA not splitting when trying to download fastq

1

Entering edit mode

2.3 years ago

Emily ▴ 70

Was trying to download fastq file as pair-end read by running the --split-files but it comes out as single read file. The original format has pair-end but it's not giving me 2 files its supposed to. https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&page_size=10&acc=SRR19687957&display=data-access

How can I resolve this problem and get the pair-end files

SRA FASTQ Linux scRNA • 2.9k views

ADD COMMENT • link updated 2.3 years ago by tomas4482 ▴ 430 • written 2.3 years ago by Emily ▴ 70

0

Entering edit mode

2.3 years ago

tomas4482 ▴ 430

This sra contains three fastq files. I1, R1 and R2 as mentioned in metadata. fastq-dump with --split-files should works. Can you paste your command here?

ADD COMMENT • link 2.3 years ago by tomas4482 ▴ 430

1

Entering edit mode

I just tested one of my own sample. fasterq-dump cannot split I1 R1 and R2. But fastq-dump works.

Full command: fastq-dump ./SRR12273028.sra -O ./data/ --split-files --gzip

The output should be something like _1.fq.gz, _2.fq.gz, _3.fq.gz.

For _1.fq:

@SRR12273028.1 SN7001050R:515:HT77VBCXY:1:1104:3416:2168 length=8
AACCGTAA

_2.fq:

@SRR12273028.1 SN7001050R:515:HT77VBCXY:1:1104:3416:2168 length=26
CAGCGACATAATGTGNTATTCTACTG

_3.fq:

@SRR12273028.1 SN7001050R:515:HT77VBCXY:1:1104:3416:2168 length=113
CTAGTAACCACGTTCTCCTGATCAAATATCAGTCTACTACTTACACGAGTGAAGATAGTATTCAGACCCCTATACTGGCTCTACATATTTAGGACAACAGAATGGTGCTAACT

Therefore, _1.fq is I1, _2.fq is R1 and _3.fq is R2.

ADD REPLY • link 2.3 years ago by tomas4482 ▴ 430

0

Entering edit mode

Yeah, another brick in the wall why fasterq-dumb (b is not a typo) is even worse than the original version, unable to perform basic operations and not providing gzip compression options. Absolutely terrible, like the enrire SRA framework. This entire sra2fastq conversion thing is one of the top unnecessary wastes of computation resources.

ADD REPLY • link 2.3 years ago by ATpoint 85k

0

Entering edit mode

Couldn't agree more.

ADD REPLY • link 2.3 years ago by tomas4482 ▴ 430

0

Entering edit mode

~/sra_data/SRR19687957$ fasterq-dump SRR19687957 --split-files is the command that I ran but still comes out single end read file

Could you explain to me what -0 ./data/ part is? im not quite sure what that particular part of does...

ADD REPLY • link 2.3 years ago by Emily ▴ 70

1

Entering edit mode

I've demonstrated fastq-dump is the only option. Should you try the suggested script first? I don't know what command your coworker used so I'm not going to comment on this.

-O refers to the output directory. You can check all arguments with --help.

You should add surfix .sra to the downloaded file. Otherwise, it will automatically download the data from SRA no matter you have downloaded it or not. The output directory need to be a different path to be distinguished from the directory which contains your downloaded .sra files. I tested many times to make fastq-dump/fasterq-dump work, it always report error when I stored the sra files with splited fastqs.

ADD REPLY • link 2.3 years ago by tomas4482 ▴ 430

0

Entering edit mode

I tried both ways .sra and without .sra, both which correctly produced 3 files _1/2/3.fastq.gz The number of read and written spots match up as well for the one without .sra.

side note: coworker's command was fasterq-dump --split-files SRR19687957.sra -- gzip which he said it still gave output as one read file. He tried it with a completely different accession number which he got 2 running the same command, but he said for some reason this one that Im working on only gave out one.

Thank you and ATpoint for all the help.

ADD REPLY • link 2.3 years ago by Emily ▴ 70

0

Entering edit mode

Glad to hear.

ADD REPLY • link 2.3 years ago by tomas4482 ▴ 430

score 3 · Accepted Answer · 2022-08-05

3

Entering edit mode

2.3 years ago

ATpoint 85k

fastq-dump --split-spot --split-files SRR19687957

will produce three files, with the first being the index file (you don't need that), the second one the UMI+CB and the third one the cDNA read. I would include --gzip to compress the files right away. Usually I would also use prefetch to download the sra file first and then run fastq-dump on that file for the conversion as the latter tool is notoriously unstable and unreliable, hence running on the downloaded file is usually a bit more robust. Typically I recommend visiting sra-explorer.info to get fastq download links directly but recently it seems to be non-functional, maybe due to changes in the ENA API that it queries for download links, at least it does not return anything in my hands so using prefetch+fastq-dump is the choice I guess.

ADD COMMENT • link 2.3 years ago by ATpoint 85k

0

Entering edit mode

I did the prefetch command to download sra filed and then did ~/sra_data/SRR19687957$ fasterq-dump SRR19687957 --split-files I also had my coworker try the way he normally does and he says he also is only getting one file instead of 2; he used fastq-dump command

ADD REPLY • link 2.3 years ago by Emily ▴ 70

0

Entering edit mode

If you prefetch first then it is fastq-dump (...) SRR19687957.sra on the downloaded file. Otherwise it makes no sense. Why fasterq again, I think it was demonstrated here compellingly that this is no choice.

ADD REPLY • link 2.3 years ago by ATpoint 85k

0

Entering edit mode

Got-it, I'll just stick with fastq-dump command when splitting files. Thanks for the explanation and help!

ADD REPLY • link 2.3 years ago by Emily ▴ 70