Question: bwa mem: Passing a variable to read group
3
gravatar for rmf
23 months ago by
rmf730
rmf730 wrote:

I would like to add read group info (-R) during the mapping/alignment stage as part of my variant calling gatk pipeline.

I am doing something like this:

bwa mem \
-M \
-t 8 \
-v 3 \
-R <(sh a-illumina-read-group.sh $1) \
"$path_dr_bwaindex_genome" \
$1 $2

where the read group string is generated automatically from the fastq file by the shell script a-illumina-read-group.sh. It produces a string like:

'@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA'

but, when I run bwa, it fails with this error: [E::bwa_set_rg] the read group line is not started with @RG

I have tried excluding the single quotes ('') around read group info, but that didn't change anything. I also tried variations in how the variable is passed.

-R=$(sh a-illumina-read-group.sh $1) \

-R=$(echo $(sh a-illumina-read-group.sh $1)) \

Just as a test, I tried:

rg='@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA'

-R <(echo $rg) \

But, they all produce the same error. I would appreciate any solutions to this issue.

I understand that I can add read groups later on using PicardTools AddOrReplaceReadGroup, but I thought it might be convenient doing it here in one step.

Thanks.

ADD COMMENTlink modified 18 months ago • written 23 months ago by rmf730

rmf, I had a similar idea and met with the same result. Did you ever find a solution?

ADD REPLYlink written 18 months ago by gregory.peterson10
1

I have added an answer.

ADD REPLYlink written 18 months ago by rmf730
3
gravatar for rmf
18 months ago by
rmf730
rmf730 wrote:

What worked for me was to read the read group information from the fastq file during the mapping run. My read name in the fastq file looks like this: @ST-E00274:188:H3JWNCCXY:4:1101:5142:1221 1:N:0:NTTGTA.

The bash file looks like this:

#!/bin/bash

header=$(zcat $1 | head -n 1)
id=$(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
sm=$(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")
echo "Read Group @RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA"

bwa mem \
-M \
-t 8 \
-v 3 \
-R $(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA") \
"$path_bwaindex_genome" \
$1 $2 | samblaster -M | samtools fixmate - - | samtools sort -O bam -o "mapped-bwa.bam"

And the bash file is run as

bwa-mapper.sh read_1.fq.gz read_2.fq.gz

You can remove this part (| samblaster -M | samtools fixmate - - | samtools sort -O bam -o) and replace with > if you don't need it. samblaster marks duplicates like Picard. I don't remember what fixmate does. The sort part sorts the SAM file and generates an output BAM.

ADD COMMENTlink modified 18 months ago • written 18 months ago by rmf730
1

Thanks, rmf ! It worked for me.

ADD REPLYlink written 18 months ago by gregory.peterson10

Hi there, thanks for providing your bash script. However, I tried to use the batch script and it printed this as below:

gzip: SRR34567_1.fastq: not in gzip format Read Group @RG\tID:\tSM:_\tLB:_\tPL:ILLUMINA [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 1600000 sequences (240000000 bp)... [M::process] read 1600000 sequences (240000000 bp)...

Nothing printed to the RG when I checked the outputted SAM file using samtools view -H sample.bam | grep '@RG'

So I wasn't sure what went wrong. Thank you.

ADD REPLYlink written 12 months ago by w330

Well.. Your error message says your input fastq file is not in gzip format. In that case you have to modify the script to accept non-gzipped fastq files. Change zcat to cat. That's as far as I can see.

ADD REPLYlink written 12 months ago by rmf730

Thank you so much for your kind reply. I have modified the zcat to cat to accept non-zipped fastq file and it did print out the read group; however, it will not run the bwa mem.

./test.sh SRR5038390_1.fastq SRR5038390_2.fastq

Read Group @RG\tID:SRR5038390.1 1 length=150\tSM:SRR5038390.1 1 length=150_\tLB:SRR5038390.1 1 length=150_\tPL:ILLUMINA

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

So I wasn't sure what went wrong. Thank you so much.

My script as below test.sh), basically I just changed the last part:

bwa mem \

-M \

-t 8 \

-R $(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA") \

/home/index/hg19/hg19bwaidx \

$1 $2 |samtools sort -O bam -o "mapped-bwa.bam"

ADD REPLYlink modified 12 months ago • written 12 months ago by w330

My header looks different:

> zcat OZBenth2_R1.fq.gz | head -n 1
@HWI-ST945_0069:2:1101:1483:1976#GNNNNN/1
> echo $(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
HWI-ST945_0069_2_1101_1483
> echo $(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")

Is there a way to make it compatible for all different Illumina headers. Did anyone use https://github.com/ekg/bamaddrg?

ADD REPLYlink written 8 months ago by Ric280

Once you figure out what the different parts of your header mean, you can split them apart using regular expressions.

ADD REPLYlink written 8 months ago by rmf730
1
gravatar for Pierre Lindenbaum
23 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:
<(sh a-illumina-read-group.sh $1)

returns a temporary filename. just try echo <(sh a-illumina-read-group.sh $1) and see...

you want (anti-quote)

`sh a-illumina-read-group.sh $1`
ADD COMMENTlink modified 23 months ago • written 23 months ago by Pierre Lindenbaum123k

-R echo <(sh a-illumina-read-group.sh $1) didn't work. Got same error. With the anti-quote, I am not sure exactly what you meant. But, I tried -R echo <(sh a-illumina-read-group.sh $1) and -R <(sh a-illumina-read-group.sh $1). Both don't work. I get the same error as before plus this new error: /var/spool/slurmd/job11433254/slurm_script: line 52: @RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA: command not found. Sorry about the weird formatting. The backticks inside backticks don't seem to work.

ADD REPLYlink modified 23 months ago • written 23 months ago by rmf730

i just wanted to show you that

 echo <(sh a-illumina-read-group.sh $1)

will print a tmp filename !

ADD REPLYlink written 23 months ago by Pierre Lindenbaum123k

Ahan. Yea. sure. That works. The variable prints the correct content as expected. It's just that bwa doesn't accept it.

ADD REPLYlink written 23 months ago by rmf730

Hmmm.. Actually not.

This works

echo $(sh a-illumina-read-group.sh $1)

@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA

This doesn't

echo <(sh a-illumina-read-group.sh $1)

/dev/fd/62

ADD REPLYlink modified 23 months ago • written 23 months ago by rmf730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1109 users visited in the last hour