how to deal with sra files which can generate three fastq files?
2
1
Entering edit mode
3.2 years ago
fanglujing ▴ 60

Hi, I have downloaded sra file from NCBI, SRR4242282.sra and I got three fastq files after use fastq-dump to extract fastq files from sra files. command :fastq-dump --split-3 --gzip SRR4242282.sra I have no idea with this result, I haven't met this before. Any suggestion would be appreciated.

sra fastq-dump fastq • 1.7k views
ADD COMMENT
1
Entering edit mode

It does look like the submitter's may have submitted index sequences in a separate file since the corresponding ENA entry also shows three fastq files. Examine the files to see which one is the index sequence containing file. It should be easily apparent because of short reads.

Edit: I will leave this here in case other submitter's have done this.

OP: Please confirm if t.kuilman's explanation is applicable in your case.

ADD REPLY
0
Entering edit mode

I have checked fastq content and I think t.kuilman's suggestion works in this situation. Thanks for the reply.

ADD REPLY
4
Entering edit mode
3.2 years ago
thomaskuilman ▴ 820

Please see my previous post: this is due to the fact that BOTH paired and unpaired reads are included in these sra files. Using the --split-files option does not work since this would lead to fastq-files that are incomplete. What you did is correct; simply use the files ending with _1 and _2 will do. The remaining files contains the unpaired reads, and can be trashed.

ADD COMMENT
0
Entering edit mode

thanks for your replay and it does help me a lot.

ADD REPLY
0
Entering edit mode
3.2 years ago

Either use only the _1 and _2 files or use option --split-files instead of --split-3

See the manual/help page:

  --split-files                    Dump each read into separate file.Files 
                                   will receive suffix corresponding to read 
                                   number 
  --split-3                        Legacy 3-file splitting for mate-pairs: 
                                   First biological reads satisfying dumping 
                                   conditions are placed in files *_1.fastq and 
                                   *_2.fastq If only one biological read is 
                                   present it is placed in *.fastq Biological 
                                   reads and above are ignored.
ADD COMMENT

Login before adding your answer.

Traffic: 2709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6