Question: using GNU parallel for bwa mem and samtools
0
gravatar for little_more
13 months ago by
little_more30
little_more30 wrote:

Hi all, My (paired-end) sequencing data is comprised of several samples:

fq/S01_R1_fq.gz
fq/S01_R2_fq.gz
fq/S02_R1_fq.gz
fq/S02_R2_fq.gz

I need to align the reads to reference genome and to convert the output to BAM format. I'd like to use parallel for that but I don't know how to do it. That's what I got for now:

parallel "bwa mem \
-t 10 \
-M hg1kv37.fa {1} {2} \
-v 1 -R "@RG\tID:{1}\tSM:{1}\tPL:ILLUMINA\tLB:{1}" \
| samtools view -Sb - > {=1 s:fq/:aln/:;s:R1_fq\.gz:alignment.bam:; =}" ::: fq/*R1.fq.gz ::: fq/*R2.fq.gz

I tried without quotes:

parallel: Warning: Input is read from the terminal. You either know what you
parallel: Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
parallel: Warning: ::: or :::: or -a or to pipe data into parallel. If so
parallel: Warning: consider going through the tutorial: man parallel_tutorial
parallel: Warning: Press CTRL-D to exit.

So I added quotes but now I keep getting the following error:

/usr/bin/sh: aln/S01_alignment.bam: No such file or directory
[E::bwa_set_rg] no ID within the read group line

I guess samtools is trying to find the output file for some reason. Any suggestions?

bwa samtools parallel • 641 views
ADD COMMENTlink modified 13 months ago by ole.tange3.9k • written 13 months ago by little_more30
1

Another option would be to use snakemake. This can handle all the parallel things for.

An introduction that might be useful for you, is this tutorial I wrote some time ago:

ADD REPLYlink written 13 months ago by finswimmer13k

see BWA mem on multiple samples

ADD REPLYlink written 13 months ago by Pierre Lindenbaum130k

With quotes, you have nested double quotes, so your shell is seeing this as one block:

"bwa mem \
-t 10 \
-M hg1kv37.fa {1} {2} \
-v 1 -R "

And this as another block:

" \
| samtools view -Sb - > {=1 s:fq/:aln/:;s:R1_fq\.gz:alignment.bam:; =}"

I would create a bash script which takes R1 and R2 as arguments, performs the file name manipulations, and then maps with bwa. You can then use this script with GNU Parallel.

ADD REPLYlink modified 13 months ago • written 13 months ago by h.mon31k

thank you all for your answers! @h.mon, I tried various combinations of quotes, including combinations of different types of quotes but it didn't help. @finswimmer, thanks for the link, I'll check it :)

ADD REPLYlink written 13 months ago by little_more30
3
gravatar for ole.tange
13 months ago by
ole.tange3.9k
Denmark
ole.tange3.9k wrote:

It makes it easier to read and to test if you create a function and call that.

You can then test your function on a single file set before handing it to GNU Parallel:

bwatosam() {
  id="$1"

  bwa mem -t 10 -M -R "@RG\tID:{}\tSM:{}\tPL:ILLUMINA\tLB:{}" hg1kv37.fa "$id"_R1.fq.gz "$id"_R2.fq.gz |
    samtools view -o "$id".bam
}
export -f bwatosam

# --plus enables {%...} to remove a substring    
ls *_R1.fq.gz |
  parallel --plus bwatosam {%_R1.fq.gz}

(Are you coming for the anniversary party or hosting your own? https://www.gnu.org/software/parallel/10-years-anniversary.html )

ADD COMMENTlink modified 13 months ago • written 13 months ago by ole.tange3.9k

Thank you so much for your solution! it seems to be exactly what I need but I get an error saying:

/usr/bin/sh: line 1: mem: command not found

I think it might be caused by the fact that I use a variable for calling bwa:

bwa=/work/gr-fe/lawless/tool/bwa-0.7.17/bwa

and then

$bwa mem ...

Do you know how can I handle it?

I also tried adding alias to bwa to my ~/.bashrc file so now I can call bwa from the terminal. But if I try to call bwa from within the script I get the error:

/usr/bin/sh: line 1: bwa: command not found

Any help appreciated!

ADD REPLYlink modified 13 months ago • written 13 months ago by little_more30

I seem to overcome the problem with bwa: I just added a path to ~/.bashrc file.

Now I have another issue:

[E::bwa_idx_load_from_disk] fail to locate the index files

although the dir contains index files. What should I do?

ADD REPLYlink written 13 months ago by little_more30

Just put the path to your index file within the function

bwa mem -t 10 -M -R "@RG\tID:{}\tSM:{}\tPL:ILLUMINA\tLB:{}" path/to/hg19/
ADD REPLYlink written 8 months ago by brunobsouzaa260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 560 users visited in the last hour