Question: How To Fix Fastq File From Sra When Fastq-Dump Split At Wrong Position
4
gravatar for Aaron H
7.5 years ago by
Aaron H170
United States/San Francisco/UCSF
Aaron H170 wrote:

Several .lite.sra files I downloaded from the SRA are giving me problems when I try to extract the paired-end fastq files. If I use any of the splitting functions of fastq-dump (--split-files or --split-spot), I get a forward read with length 76 and a reverse read with length 216 when I should be getting 146 for each. Does anyone know how to force fastq-dump to split the spots at a given length? Or is there another tool that could be used to fix the malformed reads?

For example: SRR306633.lite.sra

fastq-dump --split-files SRR306633.lite.sra
head *.fastq

==> SRR306633_1.fastq <==
@SRR306633.1 HWI-EAS66_0013_FC6270K:8:1:1351:1043 length=76
GAGTTGGTCCACGCGAATACTTGACCGTATAAACTTGGTCTGCCCACATTTATTTGCTGCCATTTGTTACGTTTGT
+SRR306633.1 HWI-EAS66_0013_FC6270K:8:1:1351:1043 length=76
B,DBDDDDB:BBB@DDDD8:DD4D;DDDDD@DDDB9DD;;BB1>*<44?2DDD@DDBDDD0<3BDBB;>)?>>;>D
@SRR306633.2 HWI-EAS66_0013_FC6270K:8:1:1660:1041 length=76
CTTGATGCAAAATCCTTTTTTGATTTACCTACAATTACTAAGTATTTCTCTCAGTGTAGCCATAAACAGCACGAAA
+SRR306633.2 HWI-EAS66_0013_FC6270K:8:1:1660:1041 length=76
GBD8DBDGGBGFG@G>GCGBH>H=F3BEDEE?:GDGDGGGDGDGEEEGGGGG:>GEBDBE84FEFGG@G-B3DBD4
@SRR306633.3 HWI-EAS66_0013_FC6270K:8:1:1929:1038 length=76
TTGTATTTTTGGTTCTACACTGNACTTTTAATTTGCGCAAATAATTGATTATTCAGCAATTTTCTTACCAATTTGA

==> SRR306633_2.fastq <==
@SRR306633.1 HWI-EAS66_0013_FC6270K:8:1:1351:1043 length=216
TTATCCTGGCTGGGGAATCCTTGATGTCAAACTACTANNCTTGTTGACTTTTTTTATGGGTNTTATTCAGCATTCAAAATTAAANNNNNNNNTCGTATTCCTGATTNGAATGACCAGTCAAGTCATTNTGATAGTATTTTTTTTTCGCAGTGCCTTGCAGTACTTGTGCCCAGNNNNNNNNNGCTGTACTGACATATNNNNNNTGGCTTTTACTTG
+SRR306633.1 HWI-EAS66_0013_FC6270K:8:1:1351:1043 length=216
D:)DB#################################################################F:DG@GGG?G98,,########6=65==9>E4CFEB#BB=:=567,@@@=<GE?G,8#A?8B?B=F3EC@AAAAA>4><EEDE8F<FGDGG:DE-:DD2D5;@#########8@>?:4,AE@-8?#####################
@SRR306633.2 HWI-EAS66_0013_FC6270K:8:1:1660:1041 length=216
AGCACTTGACGGCAATGAATTCTGACACACGTGTGCCNNCCACCCAAACTTCCCGACCGCNNCCCTCGCCAAAAAGTATAAGGNNNNNNNNNNCGGAATACAGTCCNTANNATGCGNNTTAATCTCCNACTTTTATGTTCATCCAAAGGTGGTGCACACGAACCACTGGCGCCNNNNNNNNNGGCTGCTGTAGCCCTNNNNNNAGAATCGGTCACA
+SRR306633.2 HWI-EAS66_0013_FC6270K:8:1:1660:1041 length=216
=B3=?C6=?)?CA=BDB3,DBDBD3?+<BA::A#####################################G?GEGDC3FG=:=##########:;8:1?:EEGG<@#AA##=@@,>##8==;B8@?D#DBFFEDFDA2GB@=5==BAFE-E4EB2AA>+G??@D####################################################
@SRR306633.3 HWI-EAS66_0013_FC6270K:8:1:1929:1038 length=216
ATGTTTTGAATGTTCTTTGAATGTTTTACGTAGGCCTNNATGTAAACTGCCTGCTTATTNNNTGTCATTTTTCTACTCCAACANNNNNNNNNNGNAGCTCGACTTANTTNNAAACANNTGTTGTTANNNAGCTGATCTCGATATCTATTTTGATTATCCCCCACCCAGAAGTANNNNNNNNNCTTTAAAAAATATGNNNNNNNTTTAAAAAAACAT

Thank you!

sra fastq paired conversion • 9.9k views
ADD COMMENTlink modified 2.3 years ago by A.Machado0 • written 7.5 years ago by Aaron H170

as a workaround have u tried to download the files from ddbj http://trace.ddbj.nig.ac.jp/

ADD REPLYlink written 7.5 years ago by Ying W3.9k
1
gravatar for Aaron H
7.5 years ago by
Aaron H170
United States/San Francisco/UCSF
Aaron H170 wrote:

The problem seems to be with the .sra file itself. After contacting the NCBI SRA help desk, they reloaded the Run with a different split position (145:147 so still wrong but closer).

ADD COMMENTlink written 7.5 years ago by Aaron H170

And why not 146:146?

ADD REPLYlink written 4.3 years ago by julien.roux90
0
gravatar for Casey Bergman
7.5 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

This may be a problem with the version of SRAtools are you using. I recently had problems with an old version of fastq-dump being out of sync with the latest sra-lite file format, which was solved by upgrading SRAtools to version 2.1.2, see: http://biostar.stackexchange.com/questions/11134/how-to-convert-sra-lite-paired-end-submission-to-fastq

ADD COMMENTlink written 7.5 years ago by Casey Bergman18k
1

Aaron, I've looked into this file and replicate this behavior on our system as well. It is very strange indeed. I trust you are aware that some of the accessions in this project have different read lengths for each end (http://169.237.66.249/dpgp2/DPGP2Indrelease2.xls), but in this case (SRX058159=>SRR306633) either the meta-data is incorrect (unlikely) or the submission is corrupted.

ADD REPLYlink written 7.5 years ago by Casey Bergman18k

Unfortunately, I downloaded the sra files on the same day as the sra toolkit so they should be in sync. I'm not sure how to check the version format of the sra files but I'm using fastq-dump 2.1.6

ADD REPLYlink written 7.5 years ago by Aaron H170

FYI, the run I used as an example is from the same experiment you referenced in the question you referenced.

ADD REPLYlink written 7.5 years ago by Aaron H170

Aaron, I've looked into this and replicated this behavior on our system as well. It is very strange indeed. I trust you are aware that some of the accessions in this project have different read lengths for each end (http://169.237.66.249/dpgp2/DPGP2Indrelease2.xls), but in this case (SRX058159=>SRR306633) either the meta-data is incorrect (unlikely) or the submission is corrupted. Thanks for bringing this issue to the foreground.

ADD REPLYlink written 7.5 years ago by Casey Bergman18k
0
gravatar for Daniel Standage
7.5 years ago by
Daniel Standage3.8k
Davis, California, USA
Daniel Standage3.8k wrote:

As I understand it, a spot contains biological information (your paired-end reads) and technical information (adapters, barcodes for multiplexing, etc). I think the option you're looking for is not --split-files, but --split-3 which gives you a pair of Fastq files, each corresponding record representing a pair of reads.

Try fastq-dump --split-3 SRR306633.lite.sra and see if it gives you what you want.

ADD COMMENTlink written 7.5 years ago by Daniel Standage3.8k
0
gravatar for Mariana
7.3 years ago by
Mariana0
Mariana0 wrote:

I'm having the same problem, even using the version 2.1.7. When I use the options --split-files or --split-3, I get a forward read with length 36 and a reverse read with length 116.

ADD COMMENTlink written 7.3 years ago by Mariana0

for SRR306633? Should have been fixed.

ADD REPLYlink written 7.3 years ago by Aaron H170
0
gravatar for FGV
6.6 years ago by
FGV100
FGV100 wrote:

I'm having exactly the same problem using the latest fastq-dump. I've written to NCBI help desk but, so far, got no reply.

As anyone managed to fix it??

ADD COMMENTlink written 6.6 years ago by FGV100

For this accession or for some other read file?

ADD REPLYlink written 6.6 years ago by Aaron H170
0
gravatar for A.Machado
2.3 years ago by
A.Machado0
A.Machado0 wrote:

I'm having the same problem , even using the latest version of SRA toolkit. Someone can help me with the SRR653419 read file?

I've written to Ncbi help desk, but got no reply.

Thanks in advance.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by A.Machado0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1183 users visited in the last hour