Question: Problem in processing SRA file from NCBI
15 months ago by
United States
majeedaasim30 wrote:

I downloaded the SRA file from NCBI for my organism of interest. It is Illumina sequenced paired end RNA data. Normally to create an assembly forward and reverse reads are required by Trinity. However the downloaded file has no separate fprward and reverse reads. It appears to be a merged file. The _1 and _2 suffixes suggest that.

I wonder how can I split the forward and reverse reads if it is merged. Is there any other way to get such data.


sra ncbi • 641 views
ADD COMMENTlink

How did you process the SRA file? Did you use the --split-files and -F options with fastq-dump to split the two read files and recover original Illumina fastq headers? Post the SRA # if you want someone to check on it.

ADD REPLYlink

I have not processed it yet. I just downloaded the SRA file through galaxy. On viewing the file it looks like this

ADD REPLYlink

If you are limited to working in Galaxy then I don't know the option you should use off the top of my head but make sure to choose split-files if that is available. Otherwise this is simple to take care of using from BBMap suite but that will have to be done on the command line. in=SRA.fq out1=R1.fq out2=R2.fq

ADD REPLYlink
15 months ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

The --split-files argument to fastq-dump is needed. That should produce two separate files if, indeed, the original sequencing was paired-end.

ADD COMMENTlink written 15 months ago by Sean Davis25k
5 months ago by
gtrwst90 wrote:

fastq-dump --split-3 SRR...

This does NOT redownload everything.

ADD COMMENTlink written 5 months ago by gtrwst90
