Question

Batch concatenate fastq files in series?

0

Entering edit mode

21 months ago

daniel.nebauer • 0

I have a list of 80 fastq files. These are 40 samples with 2 technical replicates each that I'd like to concatenate.

Rather than repeating for e.g:

cat sample1n1.fastq sample1n2.fastq > sample1_cat.fastq

cat sample2n1.fastq sample2n2.fastq > sample2_cat.fastq

etc...

Is there a command that automates this?

Thanks

concatenate fastq batch • 1.3k views

ADD COMMENT • link updated 21 months ago by rpolicastro 13k • written 21 months ago by daniel.nebauer • 0

0

Entering edit mode

 cat ample1n*.fastq > sample1_cat.fastq

For gzip files:

 zcat ample1n*.fastq.gz > sample1_cat.fastq.gz

ADD REPLY • link 21 months ago by shenwei356 8.5k

0

Entering edit mode

@shenwei356 this is not right. You appear to be missing a s at beginning of the command. This may also cause a problem since the cat file may also get into this wild card.

ADD REPLY • link 21 months ago by GenoMax 142k

1

Entering edit mode

You're right. To avoid re-reading the existed output file, one can set the output to a different directory (not the current path), or use a different file extension like .fq.

Or filter out the out file from the list (seems too verbose).

 echo -n >  sample1_cat.fastq
 ls sample1n*.fastq  | grep -v  sample1_cat.fastq | while read f; do cat $f >>  sample1_cat.fastq; done

ADD REPLY • link 21 months ago by shenwei356 8.5k

score 3 · Answer 1 · 2022-07-22

3

Entering edit mode

21 months ago

rpolicastro 13k

GNU parallel solution since it's convenient.

parallel -kj 1 --link --dry-run cat {1} {2} '>' {=1 s/n[12]\.fastq$// =}_cat.fastq ::: *n1.fastq ::: *n2.fastq

Remove --dry-run if the commands look good.

ADD COMMENT • link 21 months ago by rpolicastro 13k

1

Entering edit mode

Would you mind expounding a bit on what this part of the code is doing?

{=1 s/n[12]\.fastq$// =}

Seems like this could be very useful if I understood it a bit better.

ADD REPLY • link 21 months ago by Dave Carlson ★ 1.7k

2

Entering edit mode

GNU parallel has a few ways to replace or remove parts of strings. For example, {.} removes extensions, {/} removes paths, and {/.} removes both the path and extension. If you want more control for string replacement you can pass perl string replacement (which is similar to sed) via {= s/regex/replacement/ =}. In this case the regex n[12]\.fastq$ is capturing (for example) n1.fastq from sample1n1.fastq and replacing it with nothing. Note that the perl replacement starts with {=1 in the actual code because I am doing a replacement for the n1 file in each pair to come up with the final name.

See the documentation for more information.

ADD REPLY • link 21 months ago by rpolicastro 13k

1

Entering edit mode

Thanks, I appreciate it!

ADD REPLY • link 21 months ago by Dave Carlson ★ 1.7k