2
1
Entering edit mode
6 days ago
Emily ▴ 20

Was trying to download fastq file as pair-end read by running the --split-files but it comes out as single read file. The original format has pair-end but it's not giving me 2 files its supposed to. https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&page_size=10&acc=SRR19687957&display=data-access

How can I resolve this problem and get the pair-end files

SRA FASTQ Linux scRNA • 391 views
2
Entering edit mode
6 days ago
ATpoint 64k
fastq-dump --split-spot --split-files SRR19687957


will produce three files, with the first being the index file (you don't need that), the second one the UMI+CB and the third one the cDNA read. I would include --gzip to compress the files right away. Usually I would also use prefetch to download the sra file first and then run fastq-dump on that file for the conversion as the latter tool is notoriously unstable and unreliable, hence running on the downloaded file is usually a bit more robust. Typically I recommend visiting sra-explorer.info to get fastq download links directly but recently it seems to be non-functional, maybe due to changes in the ENA API that it queries for download links, at least it does not return anything in my hands so using prefetch+fastq-dump is the choice I guess.

0
Entering edit mode

I did the prefetch command to download sra filed and then did ~/sra_data/SRR19687957$fasterq-dump SRR19687957 --split-files I also had my coworker try the way he normally does and he says he also is only getting one file instead of 2; he used fastq-dump command ADD REPLY 0 Entering edit mode If you prefetch first then it is fastq-dump (...) SRR19687957.sra on the downloaded file. Otherwise it makes no sense. Why fasterq again, I think it was demonstrated here compellingly that this is no choice. ADD REPLY 0 Entering edit mode Got-it, I'll just stick with fastq-dump command when splitting files. Thanks for the explanation and help! ADD REPLY 0 Entering edit mode 6 days ago tomas4482 ▴ 280 This sra contains three fastq files. I1, R1 and R2 as mentioned in metadata. fastq-dump with --split-files should works. Can you paste your command here? ADD COMMENT 1 Entering edit mode I just tested one of my own sample. fasterq-dump cannot split I1 R1 and R2. But fastq-dump works. Full command: fastq-dump ./SRR12273028.sra -O ./data/ --split-files --gzip The output should be something like _1.fq.gz, _2.fq.gz, _3.fq.gz. For _1.fq: @SRR12273028.1 SN7001050R:515:HT77VBCXY:1:1104:3416:2168 length=8 AACCGTAA  _2.fq: @SRR12273028.1 SN7001050R:515:HT77VBCXY:1:1104:3416:2168 length=26 CAGCGACATAATGTGNTATTCTACTG  _3.fq: @SRR12273028.1 SN7001050R:515:HT77VBCXY:1:1104:3416:2168 length=113 CTAGTAACCACGTTCTCCTGATCAAATATCAGTCTACTACTTACACGAGTGAAGATAGTATTCAGACCCCTATACTGGCTCTACATATTTAGGACAACAGAATGGTGCTAACT  Therefore, _1.fq is I1, _2.fq is R1 and _3.fq is R2. ADD REPLY 0 Entering edit mode Yeah, another brick in the wall why fasterq-dumb (b is not a typo) is even worse than the original version, unable to perform basic operations and not providing gzip compression options. Absolutely terrible, like the enrire SRA framework. This entire sra2fastq conversion thing is one of the top unnecessary wastes of computation resources. ADD REPLY 0 Entering edit mode Couldn't agree more. ADD REPLY 0 Entering edit mode ~/sra_data/SRR19687957$ fasterq-dump SRR19687957 --split-files is the command that I ran but still comes out single end read file

Could you explain to me what -0 ./data/ part is? im not quite sure what that particular part of does...

1
Entering edit mode

I've demonstrated fastq-dump is the only option. Should you try the suggested script first? I don't know what command your coworker used so I'm not going to comment on this.

-O refers to the output directory. You can check all arguments with --help.

You should add surfix .sra to the downloaded file. Otherwise, it will automatically download the data from SRA no matter you have downloaded it or not. The output directory need to be a different path to be distinguished from the directory which contains your downloaded .sra files. I tested many times to make fastq-dump/fasterq-dump work, it always report error when I stored the sra files with splited fastqs.

0
Entering edit mode

I tried both ways .sra and without .sra, both which correctly produced 3 files _1/2/3.fastq.gz The number of read and written spots match up as well for the one without .sra.

side note: coworker's command was fasterq-dump --split-files SRR19687957.sra -- gzip which he said it still gave output as one read file. He tried it with a completely different accession number which he got 2 running the same command, but he said for some reason this one that Im working on only gave out one.

Thank you and ATpoint for all the help.

0
Entering edit mode