how to deal with sra files which can generate three fastq files?
2.7 years ago
fanglujing ▴ 60

Hi, I have downloaded sra file from NCBI, SRR4242282.sra and I got three fastq files after use fastq-dump to extract fastq files from sra files. command :fastq-dump --split-3 --gzip SRR4242282.sra I have no idea with this result, I haven't met this before. Any suggestion would be appreciated.

It does look like the submitter's may have submitted index sequences in a separate file since the corresponding ENA entry also shows three fastq files. Examine the files to see which one is the index sequence containing file. It should be easily apparent because of short reads.

Edit: I will leave this here in case other submitter's have done this.

I have checked fastq content and I think t.kuilman's suggestion works in this situation. Thanks for the reply.

2.7 years ago

Please see my previous post: this is due to the fact that BOTH paired and unpaired reads are included in these sra files. Using the --split-files option does not work since this would lead to fastq-files that are incomplete. What you did is correct; simply use the files ending with _1 and _2 will do. The remaining files contains the unpaired reads, and can be trashed.

thanks for your replay and it does help me a lot.

2.7 years ago

Either use only the _1 and _2 files or use option --split-files instead of --split-3

See the manual/help page:

  --split-files                    Dump each read into separate file.Files
number
--split-3                        Legacy 3-file splitting for mate-pairs:
conditions are placed in files *_1.fastq and
*_2.fastq If only one biological read is
present it is placed in *.fastq Biological