Question: bwa mem: Passing a variable to read group
2
gravatar for rmf
16 months ago by
rmf550
rmf550 wrote:

I would like to add read group info (-R) during the mapping/alignment stage as part of my variant calling gatk pipeline.

I am doing something like this:

bwa mem \
-M \
-t 8 \
-v 3 \
-R <(sh a-illumina-read-group.sh $1) \
"$path_dr_bwaindex_genome" \
$1 $2

where the read group string is generated automatically from the fastq file by the shell script a-illumina-read-group.sh. It produces a string like:

'@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA'

but, when I run bwa, it fails with this error: [E::bwa_set_rg] the read group line is not started with @RG

I have tried excluding the single quotes ('') around read group info, but that didn't change anything. I also tried variations in how the variable is passed.

-R=$(sh a-illumina-read-group.sh $1) \

-R=$(echo $(sh a-illumina-read-group.sh $1)) \

Just as a test, I tried:

rg='@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA'

-R <(echo $rg) \

But, they all produce the same error. I would appreciate any solutions to this issue.

I understand that I can add read groups later on using PicardTools AddOrReplaceReadGroup, but I thought it might be convenient doing it here in one step.

Thanks.

ADD COMMENTlink modified 11 months ago • written 16 months ago by rmf550

rmf, I had a similar idea and met with the same result. Did you ever find a solution?

ADD REPLYlink written 11 months ago by gregory.peterson10
1

I have added an answer.

ADD REPLYlink written 11 months ago by rmf550
2
gravatar for rmf
11 months ago by
rmf550
rmf550 wrote:

What worked for me was to read the read group information from the fastq file during the mapping run. My read name in the fastq file looks like this: @ST-E00274:188:H3JWNCCXY:4:1101:5142:1221 1:N:0:NTTGTA.

The bash file looks like this:

#!/bin/bash

header=$(zcat $1 | head -n 1)
id=$(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
sm=$(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")
echo "Read Group @RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA"

bwa mem \
-M \
-t 8 \
-v 3 \
-R $(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA") \
"$path_bwaindex_genome" \
$1 $2 | samblaster -M | samtools fixmate - - | samtools sort -O bam -o "mapped-bwa.bam"

And the bash file is run as

bwa-mapper.sh read_1.fq.gz read_2.fq.gz

You can remove this part (| samblaster -M | samtools fixmate - - | samtools sort -O bam -o) and replace with > if you don't need it. samblaster marks duplicates like Picard. I don't remember what fixmate does. The sort part sorts the SAM file and generates an output BAM.

ADD COMMENTlink modified 11 months ago • written 11 months ago by rmf550
1

Thanks, rmf ! It worked for me.

ADD REPLYlink written 11 months ago by gregory.peterson10

Hi there, thanks for providing your bash script. However, I tried to use the batch script and it printed this as below:

gzip: SRR34567_1.fastq: not in gzip format Read Group @RG\tID:\tSM:_\tLB:_\tPL:ILLUMINA [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 1600000 sequences (240000000 bp)... [M::process] read 1600000 sequences (240000000 bp)...

Nothing printed to the RG when I checked the outputted SAM file using samtools view -H sample.bam | grep '@RG'

So I wasn't sure what went wrong. Thank you.

ADD REPLYlink written 5 months ago by w330

Well.. Your error message says your input fastq file is not in gzip format. In that case you have to modify the script to accept non-gzipped fastq files. Change zcat to cat. That's as far as I can see.

ADD REPLYlink written 5 months ago by rmf550

Thank you so much for your kind reply. I have modified the zcat to cat to accept non-zipped fastq file and it did print out the read group; however, it will not run the bwa mem.

./test.sh SRR5038390_1.fastq SRR5038390_2.fastq

Read Group @RG\tID:SRR5038390.1 1 length=150\tSM:SRR5038390.1 1 length=150_\tLB:SRR5038390.1 1 length=150_\tPL:ILLUMINA

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

So I wasn't sure what went wrong. Thank you so much.

My script as below test.sh), basically I just changed the last part:

bwa mem \

-M \

-t 8 \

-R $(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA") \

/home/index/hg19/hg19bwaidx \

$1 $2 |samtools sort -O bam -o "mapped-bwa.bam"

ADD REPLYlink modified 5 months ago • written 5 months ago by w330

My header looks different:

> zcat OZBenth2_R1.fq.gz | head -n 1
@HWI-ST945_0069:2:1101:1483:1976#GNNNNN/1
> echo $(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
HWI-ST945_0069_2_1101_1483
> echo $(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")

Is there a way to make it compatible for all different Illumina headers. Did anyone use https://github.com/ekg/bamaddrg?

ADD REPLYlink written 4 weeks ago by Ric190

Once you figure out what the different parts of your header mean, you can split them apart using regular expressions.

ADD REPLYlink written 4 weeks ago by rmf550
1
gravatar for Pierre Lindenbaum
16 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:
<(sh a-illumina-read-group.sh $1)

returns a temporary filename. just try echo <(sh a-illumina-read-group.sh $1) and see...

you want (anti-quote)

`sh a-illumina-read-group.sh $1`
ADD COMMENTlink modified 16 months ago • written 16 months ago by Pierre Lindenbaum118k

-R echo <(sh a-illumina-read-group.sh $1) didn't work. Got same error. With the anti-quote, I am not sure exactly what you meant. But, I tried -R echo <(sh a-illumina-read-group.sh $1) and -R <(sh a-illumina-read-group.sh $1). Both don't work. I get the same error as before plus this new error: /var/spool/slurmd/job11433254/slurm_script: line 52: @RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA: command not found. Sorry about the weird formatting. The backticks inside backticks don't seem to work.

ADD REPLYlink modified 16 months ago • written 16 months ago by rmf550

i just wanted to show you that

 echo <(sh a-illumina-read-group.sh $1)

will print a tmp filename !

ADD REPLYlink written 16 months ago by Pierre Lindenbaum118k

Ahan. Yea. sure. That works. The variable prints the correct content as expected. It's just that bwa doesn't accept it.

ADD REPLYlink written 16 months ago by rmf550

Hmmm.. Actually not.

This works

echo $(sh a-illumina-read-group.sh $1)

@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA

This doesn't

echo <(sh a-illumina-read-group.sh $1)

/dev/fd/62

ADD REPLYlink modified 16 months ago • written 16 months ago by rmf550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1610 users visited in the last hour