How To Fix Fastq File From Sra When Fastq-Dump Split At Wrong Position
6
4
Entering edit mode
10.2 years ago
Aaron H ▴ 170

Several .lite.sra files I downloaded from the SRA are giving me problems when I try to extract the paired-end fastq files. If I use any of the splitting functions of fastq-dump (--split-files or --split-spot), I get a forward read with length 76 and a reverse read with length 216 when I should be getting 146 for each. Does anyone know how to force fastq-dump to split the spots at a given length? Or is there another tool that could be used to fix the malformed reads?

For example: SRR306633.lite.sra

fastq-dump --split-files SRR306633.lite.sra
head *.fastq

==> SRR306633_1.fastq <==
@SRR306633.1 HWI-EAS66_0013_FC6270K:8:1:1351:1043 length=76
GAGTTGGTCCACGCGAATACTTGACCGTATAAACTTGGTCTGCCCACATTTATTTGCTGCCATTTGTTACGTTTGT
+SRR306633.1 HWI-EAS66_0013_FC6270K:8:1:1351:1043 length=76
B,DBDDDDB:BBB@DDDD8:DD4D;DDDDD@DDDB9DD;;BB1>*<44?2DDD@DDBDDD0<3BDBB;>)?>>;>D
@SRR306633.2 HWI-EAS66_0013_FC6270K:8:1:1660:1041 length=76
CTTGATGCAAAATCCTTTTTTGATTTACCTACAATTACTAAGTATTTCTCTCAGTGTAGCCATAAACAGCACGAAA
+SRR306633.2 HWI-EAS66_0013_FC6270K:8:1:1660:1041 length=76
GBD8DBDGGBGFG@G>GCGBH>H=F3BEDEE?:GDGDGGGDGDGEEEGGGGG:>GEBDBE84FEFGG@G-B3DBD4
@SRR306633.3 HWI-EAS66_0013_FC6270K:8:1:1929:1038 length=76
TTGTATTTTTGGTTCTACACTGNACTTTTAATTTGCGCAAATAATTGATTATTCAGCAATTTTCTTACCAATTTGA

==> SRR306633_2.fastq <==
@SRR306633.1 HWI-EAS66_0013_FC6270K:8:1:1351:1043 length=216
TTATCCTGGCTGGGGAATCCTTGATGTCAAACTACTANNCTTGTTGACTTTTTTTATGGGTNTTATTCAGCATTCAAAATTAAANNNNNNNNTCGTATTCCTGATTNGAATGACCAGTCAAGTCATTNTGATAGTATTTTTTTTTCGCAGTGCCTTGCAGTACTTGTGCCCAGNNNNNNNNNGCTGTACTGACATATNNNNNNTGGCTTTTACTTG
+SRR306633.1 HWI-EAS66_0013_FC6270K:8:1:1351:1043 length=216
D:)DB#################################################################F:DG@GGG?G98,,########6=65==9>E4CFEB#BB=:=567,@@@=<GE?G,8#A?8B?B=F3EC@AAAAA>4><EEDE8F<FGDGG:DE-:DD2D5;@#########8@>?:4,AE@-8?#####################
@SRR306633.2 HWI-EAS66_0013_FC6270K:8:1:1660:1041 length=216
AGCACTTGACGGCAATGAATTCTGACACACGTGTGCCNNCCACCCAAACTTCCCGACCGCNNCCCTCGCCAAAAAGTATAAGGNNNNNNNNNNCGGAATACAGTCCNTANNATGCGNNTTAATCTCCNACTTTTATGTTCATCCAAAGGTGGTGCACACGAACCACTGGCGCCNNNNNNNNNGGCTGCTGTAGCCCTNNNNNNAGAATCGGTCACA
+SRR306633.2 HWI-EAS66_0013_FC6270K:8:1:1660:1041 length=216
=B3=?C6=?)?CA=BDB3,DBDBD3?+<BA::A#####################################G?GEGDC3FG=:=##########:;8:1?:EEGG<@#AA##=@@,>##8==;B8@?D#DBFFEDFDA2GB@=5==BAFE-E4EB2AA>+G??@D####################################################
@SRR306633.3 HWI-EAS66_0013_FC6270K:8:1:1929:1038 length=216
ATGTTTTGAATGTTCTTTGAATGTTTTACGTAGGCCTNNATGTAAACTGCCTGCTTATTNNNTGTCATTTTTCTACTCCAACANNNNNNNNNNGNAGCTCGACTTANTTNNAAACANNTGTTGTTANNNAGCTGATCTCGATATCTATTTTGATTATCCCCCACCCAGAAGTANNNNNNNNNCTTTAAAAAATATGNNNNNNNTTTAAAAAAACAT

Thank you!

sra fastq paired conversion • 12k views
ADD COMMENT
0
Entering edit mode

as a workaround have u tried to download the files from ddbj http://trace.ddbj.nig.ac.jp/

ADD REPLY
1
Entering edit mode
10.2 years ago
Aaron H ▴ 170

The problem seems to be with the .sra file itself. After contacting the NCBI SRA help desk, they reloaded the Run with a different split position (145:147 so still wrong but closer).

ADD COMMENT
0
Entering edit mode

And why not 146:146?

ADD REPLY
0
Entering edit mode
10.2 years ago

This may be a problem with the version of SRAtools are you using. I recently had problems with an old version of fastq-dump being out of sync with the latest sra-lite file format, which was solved by upgrading SRAtools to version 2.1.2, see: How To Convert Sra-Lite Paired-End Submission To Fastq?

ADD COMMENT
1
Entering edit mode

Aaron, I've looked into this file and replicate this behavior on our system as well. It is very strange indeed. I trust you are aware that some of the accessions in this project have different read lengths for each end (http://169.237.66.249/dpgp2/DPGP2Indrelease2.xls), but in this case (SRX058159=>SRR306633) either the meta-data is incorrect (unlikely) or the submission is corrupted.

ADD REPLY
0
Entering edit mode

Unfortunately, I downloaded the sra files on the same day as the sra toolkit so they should be in sync. I'm not sure how to check the version format of the sra files but I'm using fastq-dump 2.1.6

ADD REPLY
0
Entering edit mode

FYI, the run I used as an example is from the same experiment you referenced in the question you referenced.

ADD REPLY
0
Entering edit mode

Aaron, I've looked into this and replicated this behavior on our system as well. It is very strange indeed. I trust you are aware that some of the accessions in this project have different read lengths for each end (http://169.237.66.249/dpgp2/DPGP2Indrelease2.xls), but in this case (SRX058159=>SRR306633) either the meta-data is incorrect (unlikely) or the submission is corrupted. Thanks for bringing this issue to the foreground.

ADD REPLY
0
Entering edit mode
10.2 years ago

As I understand it, a spot contains biological information (your paired-end reads) and technical information (adapters, barcodes for multiplexing, etc). I think the option you're looking for is not --split-files, but --split-3 which gives you a pair of Fastq files, each corresponding record representing a pair of reads.

Try fastq-dump --split-3 SRR306633.lite.sra and see if it gives you what you want.

ADD COMMENT
0
Entering edit mode
10.0 years ago
Mariana • 0

I'm having the same problem, even using the version 2.1.7. When I use the options --split-files or --split-3, I get a forward read with length 36 and a reverse read with length 116.

ADD COMMENT
0
Entering edit mode

for SRR306633? Should have been fixed.

ADD REPLY
0
Entering edit mode
9.3 years ago
FGV ▴ 130

I'm having exactly the same problem using the latest fastq-dump. I've written to NCBI help desk but, so far, got no reply.

As anyone managed to fix it??

ADD COMMENT
0
Entering edit mode

For this accession or for some other read file?

ADD REPLY
0
Entering edit mode
5.0 years ago
A.Machado • 0

I'm having the same problem , even using the latest version of SRA toolkit. Someone can help me with the SRR653419 read file?

I've written to Ncbi help desk, but got no reply.

Thanks in advance.

ADD COMMENT

Login before adding your answer.

Traffic: 1151 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6