Question: bwa mem: Passing a variable to read group
4
gravatar for rmf
3.2 years ago by
rmf1.0k
rmf1.0k wrote:

I would like to add read group info (-R) during the mapping/alignment stage as part of my variant calling gatk pipeline.

I am doing something like this:

bwa mem \
-M \
-t 8 \
-v 3 \
-R <(sh a-illumina-read-group.sh $1) \
"$path_dr_bwaindex_genome" \
$1 $2

where the read group string is generated automatically from the fastq file by the shell script a-illumina-read-group.sh. It produces a string like:

'@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA'

but, when I run bwa, it fails with this error: [E::bwa_set_rg] the read group line is not started with @RG

I have tried excluding the single quotes ('') around read group info, but that didn't change anything. I also tried variations in how the variable is passed.

-R=$(sh a-illumina-read-group.sh $1) \

-R=$(echo $(sh a-illumina-read-group.sh $1)) \

Just as a test, I tried:

rg='@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA'

-R <(echo $rg) \

But, they all produce the same error. I would appreciate any solutions to this issue.

I understand that I can add read groups later on using PicardTools AddOrReplaceReadGroup, but I thought it might be convenient doing it here in one step.

Thanks.

ADD COMMENTlink modified 2.8 years ago • written 3.2 years ago by rmf1.0k

rmf, I had a similar idea and met with the same result. Did you ever find a solution?

ADD REPLYlink written 2.8 years ago by gregory.peterson10
1

I have added an answer.

ADD REPLYlink written 2.8 years ago by rmf1.0k
5
gravatar for rmf
2.8 years ago by
rmf1.0k
rmf1.0k wrote:

What worked for me was to read the read group information from the fastq file during the mapping run. My read name in the fastq file looks like this: @ST-E00274:188:H3JWNCCXY:4:1101:5142:1221 1:N:0:NTTGTA.

The bash file looks like this:

#!/bin/bash

header=$(zcat $1 | head -n 1)
id=$(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
sm=$(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")
echo "Read Group @RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA"

bwa mem \
-M \
-t 8 \
-v 3 \
-R $(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA") \
"$path_bwaindex_genome" \
$1 $2 | samblaster -M | samtools fixmate - - | samtools sort -O bam -o "mapped-bwa.bam"

And the bash file is run as

bwa-mapper.sh read_1.fq.gz read_2.fq.gz

You can remove this part (| samblaster -M | samtools fixmate - - | samtools sort -O bam -o) and replace with > if you don't need it. samblaster marks duplicates like Picard. I don't remember what fixmate does. The sort part sorts the SAM file and generates an output BAM.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by rmf1.0k
1

Thanks, rmf ! It worked for me.

ADD REPLYlink written 2.8 years ago by gregory.peterson10

Hi there, thanks for providing your bash script. However, I tried to use the batch script and it printed this as below:

gzip: SRR34567_1.fastq: not in gzip format Read Group @RG\tID:\tSM:_\tLB:_\tPL:ILLUMINA [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 1600000 sequences (240000000 bp)... [M::process] read 1600000 sequences (240000000 bp)...

Nothing printed to the RG when I checked the outputted SAM file using samtools view -H sample.bam | grep '@RG'

So I wasn't sure what went wrong. Thank you.

ADD REPLYlink written 2.3 years ago by w330

Well.. Your error message says your input fastq file is not in gzip format. In that case you have to modify the script to accept non-gzipped fastq files. Change zcat to cat. That's as far as I can see.

ADD REPLYlink written 2.3 years ago by rmf1.0k

Thank you so much for your kind reply. I have modified the zcat to cat to accept non-zipped fastq file and it did print out the read group; however, it will not run the bwa mem.

./test.sh SRR5038390_1.fastq SRR5038390_2.fastq

Read Group @RG\tID:SRR5038390.1 1 length=150\tSM:SRR5038390.1 1 length=150_\tLB:SRR5038390.1 1 length=150_\tPL:ILLUMINA

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

So I wasn't sure what went wrong. Thank you so much.

My script as below test.sh), basically I just changed the last part:

bwa mem \

-M \

-t 8 \

-R $(echo "@RG\tID:$id\tSM:$id"_"$sm\tLB:$id"_"$sm\tPL:ILLUMINA") \

/home/index/hg19/hg19bwaidx \

$1 $2 |samtools sort -O bam -o "mapped-bwa.bam"

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by w330

My header looks different:

> zcat OZBenth2_R1.fq.gz | head -n 1
@HWI-ST945_0069:2:1101:1483:1976#GNNNNN/1
> echo $(echo $header | head -n 1 | cut -f 1-4 -d":" | sed 's/@//' | sed 's/:/_/g')
HWI-ST945_0069_2_1101_1483
> echo $(echo $header | head -n 1 | grep -Eo "[ATGCN]+$")

Is there a way to make it compatible for all different Illumina headers. Did anyone use https://github.com/ekg/bamaddrg?

ADD REPLYlink written 23 months ago by Ric330

Once you figure out what the different parts of your header mean, you can split them apart using regular expressions.

ADD REPLYlink written 23 months ago by rmf1.0k
1
gravatar for Pierre Lindenbaum
3.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:
<(sh a-illumina-read-group.sh $1)

returns a temporary filename. just try echo <(sh a-illumina-read-group.sh $1) and see...

you want (anti-quote)

`sh a-illumina-read-group.sh $1`
ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Pierre Lindenbaum133k

-R echo <(sh a-illumina-read-group.sh $1) didn't work. Got same error. With the anti-quote, I am not sure exactly what you meant. But, I tried -R echo <(sh a-illumina-read-group.sh $1) and -R <(sh a-illumina-read-group.sh $1). Both don't work. I get the same error as before plus this new error: /var/spool/slurmd/job11433254/slurm_script: line 52: @RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA: command not found. Sorry about the weird formatting. The backticks inside backticks don't seem to work.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by rmf1.0k

i just wanted to show you that

 echo <(sh a-illumina-read-group.sh $1)

will print a tmp filename !

ADD REPLYlink written 3.2 years ago by Pierre Lindenbaum133k

Ahan. Yea. sure. That works. The variable prints the correct content as expected. It's just that bwa doesn't accept it.

ADD REPLYlink written 3.2 years ago by rmf1.0k

Hmmm.. Actually not.

This works

echo $(sh a-illumina-read-group.sh $1)

@RG\tID:ST-E00215_230_HJ3FMALXX_2\tSM:ST-E00215_230_HJ3FMALXX_2_ATCACG\tLB:ATCACG\tPL:ILLUMINA

This doesn't

echo <(sh a-illumina-read-group.sh $1)

/dev/fd/62

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by rmf1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1128 users visited in the last hour
_