Question: Problem in processing SRA file from NCBI
0
gravatar for majeedaasim
15 months ago by
majeedaasim30
United States
majeedaasim30 wrote:

I downloaded the SRA file from NCBI for my organism of interest. It is Illumina sequenced paired end RNA data. Normally to create an assembly forward and reverse reads are required by Trinity. However the downloaded file has no separate fprward and reverse reads. It appears to be a merged file. The _1 and _2 suffixes suggest that.

I wonder how can I split the forward and reverse reads if it is merged. Is there any other way to get such data.

Thanks

sra ncbi • 641 views
ADD COMMENTlink modified 5 months ago by gtrwst90 • written 15 months ago by majeedaasim30
1

How did you process the SRA file? Did you use the --split-files and -F options with fastq-dump to split the two read files and recover original Illumina fastq headers? Post the SRA # if you want someone to check on it.

ADD REPLYlink modified 15 months ago • written 15 months ago by genomax62k

I have not processed it yet. I just downloaded the SRA file through galaxy. On viewing the file it looks like this

@FCC0MWCACXX:3:1101:1249:2088/1
NATCCGCCTAAGGAGGGGCTCACGTCTGATTAGCTAGTTGGTGAGGCAATGGCTTACCAAGGCTCCGATCAGTAGTTGGTCTGAGAGGAT
+
#4=DFFFFHHHHHIJJJJJJJJJJGHIJJJJIJJJIJGIIIFGIIDHIJIJJJIIGHHHHDFFFCDDDBDDDDDDFDDCBCDCCDDDDBD
@FCC0MWCACXX:3:1101:1249:2088/2
CACGCGGCATTGCTCCGTCAGGCTTTCGCCCATTGCGGAAAATTCCTCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCA
+
CCCFFFFFHGHHHJJJIHIJJJJIJJJJJJJJIJJJJJGJHHEHFFFFFFEEEEEEDCDDDDBDDDDD<CDDDDDDDBBBDDDEDDDDDD
@FCC0MWCACXX:3:1101:1175:2193/1
CAAATTCAAACCGCGCAGGAAGTCCCTCTTCCAACAATGCTGGACTCGCCCCTAATGCCGCTACTCCTCAGCCAGAGTTAAATCACAATA
+
CCCFFFFFHGHHHIIIJJIIJEDHIJFJIGIIJJJIJIGIJJJJJIGGIIGHFFFFFFFDDDDDCDDDDCACDDDDD>ACDDEDDDDDDD
@FCC0MWCACXX:3:1101:1175:2193/2
CTGATTTTTTGTCAAAAAACTCCGGCAGAAATCTCTTCTCTGTGCTATGAATTTTGTCCCAAGAAAACCACTTCGAATAGCTGGGGATTT
+
C@CFFFFFHHHHHGIIIJIJGIIGGIIHGGGDHGIJJIIJIIFHHIIJIGGGIIJJHEHHGHEFFFEEDDDDDDB=<>?CDDCDDDDDDD
@FCC0MWCACXX:3:1101:1344:2132/1
CTTTTGGCATTTAATTTATGTATGGTTATTTAATTTTTTTTGTAGCTGACTTGTGGCTCAACATATATTTATTGTAAAGGTTTTAATTTA
+
@CCFDF@DBBFHDGHGIGI>AIGGIFHGIIIGIIIGIIIIIIEFD@GCGHFHICGGEHIIIHHHFGDCDEFFFDE6;;AC@CCBCCCEEC
@FCC0MWCACXX:3:1101:1344:2132/2
ADD REPLYlink modified 15 months ago by genomax62k • written 15 months ago by majeedaasim30

If you are limited to working in Galaxy then I don't know the option you should use off the top of my head but make sure to choose split-files if that is available. Otherwise this is simple to take care of using reformat.sh from BBMap suite but that will have to be done on the command line. reformat.sh in=SRA.fq out1=R1.fq out2=R2.fq

ADD REPLYlink modified 15 months ago • written 15 months ago by genomax62k
0
gravatar for Sean Davis
15 months ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

The --split-files argument to fastq-dump is needed. That should produce two separate files if, indeed, the original sequencing was paired-end.

ADD COMMENTlink written 15 months ago by Sean Davis25k
0
gravatar for gtrwst9
5 months ago by
gtrwst90
gtrwst90 wrote:

fastq-dump --split-3 SRR...

This does NOT redownload everything.

ADD COMMENTlink written 5 months ago by gtrwst90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1370 users visited in the last hour