Trimming by fastp for multiple PE fastq.gz files using shell script
2.4 years ago

Dear all, I was trying to do trimming using fastp for multiple paired end fastq files with the help of a shell script. The two(forward and reverse) fastq.gz files were formed by SRA toolkit using fastq-dump.

fastq-dump -I --split-files --gzip

When I try to run the two files for analysis by fastp I am unable to specify the two paired end files inside my shell script. I tried

         file1 in /path_to_files/*_1.fastq.gz ;do file2=${file1%%_1.fastq.gz }"_2.fastq.gz";                                                                                                 fastp -i ${file1} -I ${file2} -o ${file1}_trimmed.fastq.gz -O ${file2}_trimmed.fastq.gz

but it gives an error as Error to read gzip file ............... the file doesn't exist. I don't know where I am doing wrong and is there any alternative way to specify my two files one by one in a for loop in bash shell for multiple files.

Any help will be much appreciated. Thanks!

2.4 years ago

assuming sorting the paths will return the correct output:

$ find src/test/resources/ -name "*.fq.gz" | sort | paste - - | while read A B ; do echo " fastq1 and is  $A fastq2 is $B" ; done

 fastq1 and is  src/test/resources/S1.R1.fq.gz fastq2 is src/test/resources/S1.R2.fq.gz
 fastq1 and is  src/test/resources/S2.R1.fq.gz fastq2 is src/test/resources/S2.R2.fq.gz
 fastq1 and is  src/test/resources/S3.R1.fq.gz fastq2 is src/test/resources/S3.R2.fq.gz
 fastq1 and is  src/test/resources/S4.R1.fq.gz fastq2 is src/test/resources/S4.R2.fq.gz
 fastq1 and is  src/test/resources/S5.R1.fq.gz fastq2 is src/test/resources/S5.R2.fq.gz

