Question: How to process paired end data (using fastq-dump) to get Fw and Rv files
0
gravatar for 2405592M
6 months ago by
2405592M10
2405592M10 wrote:

Hi guys. I donwloaded a fastq file using the fast-dump command (sra toolkit) to get paired end data that I want to analyse. However, the fastq file comes up as one file (was expecting two; Fw and Rv). I want to use trimmomatic which needs two input files. How do I get around this? New to the scene as you can tell! Thanks in advance!

ADD COMMENTlink modified 4 months ago • written 6 months ago by 2405592M10
2

Check this thread for related answers How to split paired end SRA file into 2 correct fastq files .

ADD REPLYlink written 6 months ago by arup720

Hi guys! Sorry this is a few months later but my follow up question is related to this thread. Is it at all possible that the above SRR fastq file is in fact an interleaved fastq file. When I more the SRR file, this was the first few lines:

@SRR1909108.1 1 length=151
TGCTCTGATGAAATCACTAATAGGAAGTGCCGTCAGAAGCGATAACTGACGAGGACTACTCCTGTCTGATTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACTTGAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAATTTTTT
+SRR1909108.1 1 length=151
A1AAAFFFFFFFGGGGFG31FDB1110B11A0EA00111A00B//BF11A/////AF0BGEAG1GAG11DG211DB/////00>1F0B/B?G/@DFGHGE1F@1@@1BF111B1FD0?/F0BB/EGHGDG1DGG1BBDGFCC?########
@SRR1909108.2 2 length=151
TTCGTGATCGATGTGGTTACGTCTTTCTATTTCTTATTTTCACTCTTCTTTACTCCATTCTCTCTTTTTTCTCTTTTTCCTTCTTCTTCTTTTTTAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTCTCTTTTTT
+SRR1909108.2 2 length=151
AA@AAA1FFA?ADF11F1111B0003AD333BA3122222222211A1B112AB1A11222221BF111/0A22D1D1011D1FG2B12D@############################################################

To my follow up point, if these fastq file are indeed interleaved, what would be the best way to trim off the 5' and 3' adapter sequences? Would you have to treat the fastq file as single end reads or is there another proceedure?

ADD REPLYlink written 4 months ago by 2405592M10
1

SRR1909108 is a single end sample (here is ENA entry for it) so there is no interleaving here. Those are just reads in sequence (1,2,3,4 etc).

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax59k
3
gravatar for Devon Ryan
6 months ago by
Devon Ryan86k
Freiburg, Germany
Devon Ryan86k wrote:

You need --split-3, also use ENA rather than SRA if you can, it's much faster.

ADD COMMENTlink written 6 months ago by Devon Ryan86k

Thanks Devon! 2 questions. 1) what would the command be? I've tried fastq-dump --split-3 SRR1909107 but I'm still getting 1 fastq file ? 2) With regards to the ENA, can I download directly from the command line or would I have to manually download these files from the ENA website? Appreciate the help!

ADD REPLYlink written 6 months ago by 2405592M10
2

SRR1909107 is indeed single-end. Not uncommon that people mislable files that are uploaded to the NCBI. Also not uncommon that some lane replicates would be paired and other single, because who cares about confounding effects and things, right :-D Anyway, for the ENA, there is a good documentation for downloads here.

ADD REPLYlink modified 6 months ago • written 6 months ago by ATpoint11k
1

The person who uploaded those samples either mislabeled them or only uploaded one of the two reads, it's unclear which. I suggest you contact whoever uploaded that and ask them.

You can use wget or curl or ascp with ENA too, just like SRA. The main difference is that you will directly get fastq files and not the silly SRA files.

ADD REPLYlink written 6 months ago by Devon Ryan86k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2052 users visited in the last hour