How to concatenate multiple fastq files (located in different directories) for each sample
2
0
Entering edit mode
3.1 years ago
salehm ▴ 10

Hi,

I received RNA seq data for 55 samples run by illumina sequencer Nextseq500. Each sample has 4 fastq files and each file is in a separate directory. So I have a total of 220 directories, each directory has only one fastq file. Now I need to concatenate each 4 files (belong to their respective sample) in a single fastq file. I used to use this command:

"for i in $(find ./ -type f -name ".fastq.gz" | while read F; do basename $F | rev | cut -c 22- | rev; done | sort | uniq) do echo "Merging R1" cat "$i"_L00_R1_001.fastq.gz > "$i"_ME_L001_R1_001.fastq.gz done"

However, it needs that all files to be in one directory. My files are now in 220 directories. So I am wondering if there is a way to modify this command to look for files in different directories. Or if there is a command, I could use to move each file in the individual directories to a single directory.

Thank you for your help.

RNA-Seq • 1.3k views
ADD COMMENT
2
Entering edit mode
3.1 years ago
GenoMax 141k

See this answer for inspiration: C: Concatenating fastq.gz files across lanes

ADD COMMENT
1
Entering edit mode
3.1 years ago

GNU parallel solution that is untested but should work (which will probably summon Ole Tange to provide a better version). This will search recursively through directories from the parent directory, which is the only requirement.

parallel --dry-run -j1 cat {} '>>' '$(basename {} | rev | cut -c 22- | rev)'_ME_L001_R1_001.fastq.gz ::: $(find . -type f -name ".fastq.gz")

Remove --dry-run if it looks good. -j1 means to run the command for one file at a time, but you can increase that for parallelization (or remove it to use all available cores).

ADD COMMENT
0
Entering edit mode

Thank you @ rpolicastro

ADD REPLY

Login before adding your answer.

Traffic: 3720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6