fastq-dump: failure to get two fastq files
1
0
Entering edit mode
2.1 years ago
msrk04011 • 0

Hello,

I have being trying to get fastq files from sra data of SRR1030614. This is registered as paired-end, so I tried as follows.

\$ fastq-dump --split-files SRR1030614


As the result, I got SRR1030614_2.fastq, but not SRR1030614_1.fastq. In addition, I got the following message from fastq-dump program:

Rejected 27168787 READS because READLEN < 1
Read 27168787 spots for SRR1030614.sra
Written 27168787 spots for SRR1030614.sra


When I checked the entry SRR1030614 on NCBI SRA, in the "Reads" tab I see the read data such as

Reads (separated)
>gnl|SRA|SRR1030614.1.1 1 (Technical)
>gnl|SRA|SRR1030614.1.2 1 (Biological)
CTGATCCGAACATTGTGTACATGACCATTTCGATGATGTACAGTACAATCGTCACATAGA
AGATAACCCGCCACGCGCTAATTGTTTGGTTGCCGTGTGTG


So maybe I cannot get the SRR1030614_1.fastq file because it is empty? Also, if I cannot have two separate fastq files, is it ok to run the downstream analysis (e.g. trinity) specifying the read file is single-ended? Any comments will be much appreciated. Thank you.

RNA-Seq • 2.1k views
0
Entering edit mode

Thank you so much for your suggestion. As you told, for the accession only one file has been provided at ENA, and it gives the same result when subjected to fastq-dump. I'll check with the authors, but meanwhile I am going to try performing Trinity specifying it as single-ended.

1
Entering edit mode
2.1 years ago
ATpoint 48k

When checking on SRA the first read is indeed listed with read length 0. I would guess that this was a single-end run and something went wrong when uploading data (just a guess) incorrectly labelling it as paired. You'll probably have to live with this. If you double check at the ENA, they only provide one file. You can still try contacting the authors and ask for clarification.