Question: how to deal with sra files which can generate three fastq files?
1
gravatar for fanglujing
2.1 years ago by
fanglujing40
China/xiamen
fanglujing40 wrote:

Hi, I have downloaded sra file from NCBI, SRR4242282.sra and I got three fastq files after use fastq-dump to extract fastq files from sra files. command :fastq-dump --split-3 --gzip SRR4242282.sra I have no idea with this result, I haven't met this before. Any suggestion would be appreciated.

fastq-dump fastq sra • 1.1k views
ADD COMMENTlink modified 2.1 years ago by thomaskuilman800 • written 2.1 years ago by fanglujing40
1

It does look like the submitter's may have submitted index sequences in a separate file since the corresponding ENA entry also shows three fastq files. Examine the files to see which one is the index sequence containing file. It should be easily apparent because of short reads.

Edit: I will leave this here in case other submitter's have done this.

OP: Please confirm if t.kuilman's explanation is applicable in your case.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genomax92k

I have checked fastq content and I think t.kuilman's suggestion works in this situation. Thanks for the reply.

ADD REPLYlink written 2.1 years ago by fanglujing40
3
gravatar for thomaskuilman
2.1 years ago by
thomaskuilman800
thomaskuilman800 wrote:

Please see my previous post: this is due to the fact that BOTH paired and unpaired reads are included in these sra files. Using the --split-files option does not work since this would lead to fastq-files that are incomplete. What you did is correct; simply use the files ending with _1 and _2 will do. The remaining files contains the unpaired reads, and can be trashed.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by thomaskuilman800

thanks for your replay and it does help me a lot.

ADD REPLYlink written 2.1 years ago by fanglujing40
0
gravatar for Santosh Anand
2.1 years ago by
Santosh Anand5.2k
Santosh Anand5.2k wrote:

Either use only the _1 and _2 files or use option --split-files instead of --split-3

See the manual/help page:

  --split-files                    Dump each read into separate file.Files 
                                   will receive suffix corresponding to read 
                                   number 
  --split-3                        Legacy 3-file splitting for mate-pairs: 
                                   First biological reads satisfying dumping 
                                   conditions are placed in files *_1.fastq and 
                                   *_2.fastq If only one biological read is 
                                   present it is placed in *.fastq Biological 
                                   reads and above are ignored.
ADD COMMENTlink written 2.1 years ago by Santosh Anand5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1308 users visited in the last hour