Merging .fq files per sample using a big loop - BASH?
1
0
Entering edit mode
3.2 years ago

Hi everyone!

After demultiplexing my samples, I have 3 .fq files per sample (total of 720 samples) containing the raw reads. I managed to merge them in a merged files using this command line

sudo cat PA001*.fq > PA001m.fq

where my .fq files per sample are PA001.fq, PA001_1.fq, PA001_2.fq etc.. and PA001m.fq is the merged file.

Now I have 720 samples in total. What would a loop look like to apply this command line to all of the samples?

Cheers!

fastq bash genomics • 1.1k views
ADD COMMENT
2
Entering edit mode
cat PA001*.fq > PA001m.fq

dangerous.

ADD REPLY
1
Entering edit mode
3.2 years ago

Don't use sudo for normal functions as in sudo cat PA001*.fq > PA001m.fq. Following may work. Remove --dry-run to execute the command. Make sure that system has enough resources.

$ parallel --dry-run 'cat {=s/_2//=} {=s/_2/_1/=} {} > {=s/_2.fq//=}_m.fq' ::: *_2.fq

bash loop:

$ for i in $(ls *_2.fq); do echo "cat ${i/_2*/.fq} ${i/_2/_1} $i > ${i/_2/_m}";done

Remove echo after checking the dry-run.

without bash loop:

$ find * -type f -name '*.fq' ! -name '*_*.fq' -exec bash -c 'cat {} ${0/./_1.} ${0/./_2.} > ${0/./_m.}' {} \;

Please take a backup of files before you proceed or try with example files in a test directory.

ADD COMMENT
0
Entering edit mode

You could simplify it a bit with

parallel --dry-run 'cat {=s/_.*(?=\.fq$)/*/=} > {=s/_2.fq/_m\.fastq/=}' ::: *_2.fq
ADD REPLY
0
Entering edit mode

dry run output is cat 'PA001*' > PA001_m.fq..might be risky as _m.fq resides in the same directory.

ADD REPLY
0
Entering edit mode

I modified the code to avoid that problem.

ADD REPLY
0
Entering edit mode

This would lead to a different problem with parallel version 20201122 (mint 20.1, bash) as dry-run output is cat 'PA001*.fq' > PA001_m.fastq. cat cannot find 'PA001*.fq' as it is quoted.

ADD REPLY
0
Entering edit mode

Or simply make a new directory, with symbolic links to the files and then you are save even when things go wrong as you can always go back to the files without taking an extra backup.

ADD REPLY
0
Entering edit mode

Hi, thanks for your help.

As I am a real beginner here, I struggle to understand the logic of the code. Also for each of my samples I have 3 or 4 files or 5 it changes throughout the batch, would the code still apply? I am running it on a copy of my samples.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6