Help needed with fastq-dump (SRA toolkit)
0
0
Entering edit mode
4.6 years ago
RNAeye ▴ 70

Hi, I have posted this question at seqanswers, I have not gotten any response yet. I am giving a try here.

I am trying to split an .sra file into R1.fastq and R2.fastq However, I am getting single file, and I think forward and reverse reads are joined. Here is the accession number: SRR5439504.sra

Command I run is

 fastq-dump -I --split-files SRR5439504.sra


I got following output:

@SRR5439504.1.1 1 length=302
CCATAACCCTAACCCTAACCCTAACCCTAACTCTATCCATAACCCTAACCCTTACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAGCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAGCCTAAGCGTAGCCCTAAGCCTAAGCCTAAGCCAAAGCGTAAGCCTAAGCCTAAGCCACAGCATAAAAAAAAGCAAAAACATAAACCCAAGAAAAAG
+SRR5439504.1.1 1 length=302
F22F<2@2C?02GCFHF?FB0?0?02BB44B334?3B33/0B?20/0003@33BB33223B21E1G?2FG1BF2BB1BB2FA1BF1A112B2FAA3CBFE1FHFHGFAHGHHHHGHHGFBHHGFBAFFFGGGGFGEFEGFFBFFFFCCBBBCBCBCFFFCFFFGGGGGGGGGCFGHHHHGHHFFGHCFCGHCHFHHGHFCB1AA233333B3B0BA0133222333333333B3@3F322B321>>11@3BF@3333333B322BB/2333433/<</02<2@///2<<110////00000.
@SRR5439504.2.1 2 length=302
CTCTAACCCTAACTCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACTCTAACCCTAACCCTAACCCTAACCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCGTAACCCTAAGCCTAACCCTAACCCTAACCCAAAACATAAGCCAAAGCCTAACCCTAACCCCAAGCATAATCCTAAACATAATCACACA
+SRR5439504.2.1 2 length=302
?1A?0GF0>2@@@10HGFFEG?00AF0/FBFB0HFB<00>0BF0B/0BF0HGBBFBFG@BB1CFBBB00>0GF>0>B0B0BA0BB01AB0F00/0B00FA0GF00F00B0FA0FF00A00G0G0A0G00AB1GGGGGGGFF>CFFFAAAA@BABBBFFFBFFFGGGGGGGE44AEAAFFEH2F2GF222A22222BB2A2B1FFC2BF1ABE10ABA131B2?3333B32??12F2B1B2F2111??1B133333300B3B0BFC00?B?F0B///C//01BB22?12@1111@@2>1111/


I would expect two files R1.fastq and R2.fastq. I am wondering if I am doing something wrong. I used

fastq-dump : 2.8.2


Thank you for the help in advance.

fastq-dump sra-toolkit • 2.8k views
1
Entering edit mode

Looking at the SRA record the sequence seems to have been submitted as single (302 bp) reads (even though the layout is described as PAIRED) from a CIRCLE-seq experiment. So you are likely not going to get the paired-end sequence from SRA. I don't know what CIRCLE-seq is but you can take a look at the Nature protocol paper mentioned and process the data accordingly. Perhaps every read represents a circular sequence of some sort?

0
Entering edit mode

Hi GenoMax, Thank you for the answer at both sites. I checked the paper again, and finally found description of the reads. They did 150 bp paired end sequencing. They must have prepared the files wrong. I emailed the author, and let's see if they will fix it.