Question

How to merge multiple fastq file using table?

0

Entering edit mode

5.3 years ago

rimgubaev ▴ 330

I got the following table containing sample names and corresponding replicates like this:

Sample Replicate
S1     r12
S1     r25
S1     r68
S2     r58
S2     r34
S4     r13
etc.

In the folder I got the corresponding fastq files (for example: r12.fastq). The total amount of replicates is around 300 so making the:

cat r12.fastq r25.fastq r68.fastq > S1.fastq

would be really time consuming and exhausting.

I wonder if someone already faced such problem and could share the solution. I understand that here should be some kind of bash script with for loop but I got no idea how to organize it + the number of replicates is not the same for each sample.

fastq cat bash • 2.4k views

ADD COMMENT • link 5.3 years ago by rimgubaev ▴ 330

score 6 · Accepted Answer · 2019-04-18

6

Entering edit mode

5.3 years ago

Asaf 10k

Didn't test but this should work:

awk '{print "touch "$1".fastq && cat "$2".fastq >> "$1".fastq"}' table.txt > runscript.sh
source runscript.sh

First generate a script of cat operations (look at it to see that it's valid!) and then run all the cats.

ADD COMMENT • link 5.3 years ago by Asaf 10k

1

Entering edit mode

Elegant :)

ADD REPLY • link 5.3 years ago by ATpoint 84k

1

Entering edit mode

This is a nice one! I did exactly the same script containing many cat command rows in R since I'm not a good bash user.

ADD REPLY • link 5.3 years ago by rimgubaev ▴ 330

score 4 · Accepted Answer · 2019-04-18

4

Entering edit mode

5.3 years ago

Pierre Lindenbaum 163k

using nextflow

usage:

nextflow run --input config.tsv --basedir ${PWD} biostar375624.nf

ADD COMMENT • link 5.3 years ago by Pierre Lindenbaum 163k

score 3 · Accepted Answer · 2019-04-18

3

Entering edit mode

5.3 years ago

ATpoint 84k

Given this list was called foo.txt you can use:

cut -f1 foo.txt | \
  sort -k1,1 -u | \
  while read p; do 
    grep "${p}" foo.txt | \
    awk '{print $2".fastq"}' | \
    xargs cat > ${p}.fastq
    done < /dev/stdin

It first extracts the unique sample names, then loop-wise collects the names of the replicates that belong to one sample and then uses xargs together with cat to concatenate them.

ADD COMMENT • link 5.3 years ago by ATpoint 84k

1

Entering edit mode

Nested pipes :0

ADD REPLY • link 5.3 years ago by Asaf 10k

2

Entering edit mode

:-D

ADD REPLY • link 5.3 years ago by ATpoint 84k