Help needed with fastq-dump (SRA toolkit)
4.6 years ago
RNAeye ▴ 70

Hi, I have posted this question at seqanswers, I have not gotten any response yet. I am giving a try here.

I am trying to split an .sra file into R1.fastq and R2.fastq However, I am getting single file, and I think forward and reverse reads are joined. Here is the accession number: SRR5439504.sra

Command I run is

 fastq-dump -I --split-files SRR5439504.sra


I got following output:

@SRR5439504.1.1 1 length=302
CCATAACCCTAACCCTAACCCTAACCCTAACTCTATCCATAACCCTAACCCTTACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAGCCTAAGCGTAGCCCTAAGCCTAAGCCTAAGCCAAAGCGTAAGCCTAAGCCTAAGCCACAGCATAAAAAAAAGCAAAAACATAAACCCAAGAAAAAG
+SRR5439504.1.1 1 length=302
F22F<2@2C?02GCFHF?FB0?0?02BB44B334?3B33/0B?20/0003@33BB33223B21E1G?2FG1BF2BB1BB2FA1BF1A112B2FAA3CBFE1FHFHGFAHGHHHHGHHGFBHHGFBAFFFGGGGFGEFEGFFBFFFFCCBBBCBCBCFFFCFFFGGGGGGGGGCFGHHHHGHHFFGHCFCGHCHFHHGHFCB1AA233333B3B0BA0133222333333333B3@3F322B321>>11@3BF@3333333B322BB/2333433/<</02<2@///2<<110////00000.
@SRR5439504.2.1 2 length=302
CTCTAACCCTAACTCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACTCTAACCCTAACCCTAACCCTAACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCGTAACCCTAAGCCTAACCCTAACCCTAACCCAAAACATAAGCCAAAGCCTAACCCTAACCCCAAGCATAATCCTAAACATAATCACACA
+SRR5439504.2.1 2 length=302
?1A?0GF0>2@@@10HGFFEG?00AF0/FBFB0HFB<00>0BF0B/0BF0HGBBFBFG@BB1CFBBB00>0GF>0>B0B0BA0BB01AB0F00/0B00FA0GF00F00B0FA0FF00A00G0G0A0G00AB1GGGGGGGFF>CFFFAAAA@BABBBFFFBFFFGGGGGGGE44AEAAFFEH2F2GF222A22222BB2A2B1FFC2BF1ABE10ABA131B2?3333B32??12F2B1B2F2111??1B133333300B3B0BFC00?B?F0B///C//01BB22?12@1111@@2>1111/


I would expect two files R1.fastq and R2.fastq. I am wondering if I am doing something wrong. I used

fastq-dump : 2.8.2


Thank you for the help in advance.

Looking at the SRA record the sequence seems to have been submitted as single (302 bp) reads (even though the layout is described as PAIRED) from a CIRCLE-seq experiment. So you are likely not going to get the paired-end sequence from SRA. I don't know what CIRCLE-seq is but you can take a look at the Nature protocol paper mentioned and process the data accordingly. Perhaps every read represents a circular sequence of some sort?

Hi GenoMax, Thank you for the answer at both sites. I checked the paper again, and finally found description of the reads. They did 150 bp paired end sequencing. They must have prepared the files wrong. I emailed the author, and let's see if they will fix it.