BWA-MEM with an array of files
1
0
Entering edit mode
17 hours ago
garcesj ▴ 50

Hi,

I'm trying to run BWA-MEM with an array of files rather than manually inputting each single one... but it seems BWA is not getting the files correctly. Does anyone have some experience with this, please?

This is my command,

bamlist=$(find $PWD -name "${id}*.fastq.gz" | sort)
bwa mem ${reference} "${bamlist[@]}"

I don't really understand it because echoing it seems to be right,

bwa mem genome/gr37.fasta Sample_23/23_E_2_S214_L006_R1_001.fastq.gz Sample_23/23_E_2_S214_L006_R2_001.fastq.gz Sample_23/23_E_2_S214_L007_R1_001.fastq.gz Sample_23/23_E_2_S214_L007_R2_001.fastq.gz Sample_23/23_E_2_S214_L008_R1_001.fastq.gz Sample_23/23_E_2_S214_L008_R2_001.fastq.gz
alignment array BWA-MEM • 182 views
ADD COMMENT
0
Entering edit mode

what is the (error) message you get, if any?

ADD REPLY
0
Entering edit mode

None in particular (that's why I didn't indicate it). It just gives back the BWA-MEM help.

ADD REPLY
1
Entering edit mode

aha, but bwa mem only takes at best 1 pair of fastq files as input, not a whole list.

So to achieve this efficiently do what Pierre Lindenbaum suggests below.

ADD REPLY
1
Entering edit mode
17 hours ago

solution 1: merge each R1 and R2 files into one. Beware, the order of the fastq files must be the same

find Sample_23 -type f -name "*.fastq.gz" | grep _R1_ | sort | xargs cat > Sample_23/concat.R1.fq.gz
find Sample_23 -type f -name "*.fastq.gz" | grep _R2_ | sort | xargs cat > Sample_23/concat.R2.fq.gz

and then

bwa mem genome/gr37.fasta Sample_23/concat.R1.fq.gz Sample_23/concat.R2.fq.gz

solution 2:

use a workflow manager to align each pair of fastq in parallel and later, merge the bams using samtools merge

solution 3:

don't reinvent the wheel and use an existing workflow like sarek https://github.com/nf-core/sarek

ADD COMMENT
0
Entering edit mode

For solution 1, because bwa streams the data, you can save some space by using a named pipe:

mkfifo Sample_23/concat.R1.fq.gz
find Sample_23 -type f -name "*.fastq.gz" | grep _R1_ | sort | xargs cat > Sample_23/concat.R1.fq.gz &
...
ADD REPLY

Login before adding your answer.

Traffic: 5031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6