Question: How to process paired end data (using fastq-dump) to get Fw and Rv files
11 days ago by
2405592M0 wrote:

Hi guys. I donwloaded a fastq file using the fast-dump command (sra toolkit) to get paired end data that I want to analyse. However, the fastq file comes up as one file (was expecting two; Fw and Rv). I want to use trimmomatic which needs two input files. How do I get around this? New to the scene as you can tell! Thanks in advance!

ADD COMMENTlink modified 10 days ago by Devon Ryan81k • written 11 days ago by 2405592M0

Check this thread for related answers How to split paired end SRA file into 2 correct fastq files .

written 11 days ago by arup340
10 days ago by
Devon Ryan81k
Freiburg, Germany
Devon Ryan81k wrote:

You need --split-3, also use ENA rather than SRA if you can, it's much faster.

written 10 days ago by Devon Ryan81k

Thanks Devon! 2 questions. 1) what would the command be? I've tried fastq-dump --split-3 SRR1909107 but I'm still getting 1 fastq file ? 2) With regards to the ENA, can I download directly from the command line or would I have to manually download these files from the ENA website? Appreciate the help!

written 10 days ago by 2405592M0

SRR1909107 is indeed single-end. Not uncommon that people mislable files that are uploaded to the NCBI. Also not uncommon that some lane replicates would be paired and other single, because who cares about confounding effects and things, right :-D Anyway, for the ENA, there is a good documentation for downloads here.

modified 10 days ago • written 10 days ago by ATpoint4.4k

The person who uploaded those samples either mislabeled them or only uploaded one of the two reads, it's unclear which. I suggest you contact whoever uploaded that and ask them.

You can use wget or curl or ascp with ENA too, just like SRA. The main difference is that you will directly get fastq files and not the silly SRA files.

written 10 days ago by Devon Ryan81k
