[vg giraffe] Unable to add read group to BAM file
1
0
Entering edit mode
20 days ago
saruman ▴ 10

Hi everyone,

I am aligning short read sequences (HG002) against the Human pangenome reference graph using VG Giraffe. Specifically, I am using the .gbz and .hapl index files to generate the BAM file. The alignment works without issues; however, I am unable to add the read group (I can do it using Picard, though I would like to spare this additional step).

singularity exec \
    -B /fs/scratch/PAS2525/trimmed:/input \
    -B /fs/scratch/PAS2525/output:/output \
    -B /fs/scratch/PAS2525/References/hprc:/index \
    -B /fs/scratch/PAS2525/tmp:/tmp \
    /fs/scratch/PAS2525/Singularity/vg_v1.68.0.sif \
    vg giraffe \
    --progress --rescue-algorithm none --output-format bam --threads 44 \
    --read-group 'ID:1\tSM:HG002\tLB:lib1\tPL:illumina\tPU:unit1' \
    --gbz-name /index/hprc.gbz \
    --haplotype-name /index/hprc.hapl \
    --kff-name /output/HG002.kff \
    --fastq-in /input/HG002.NovaSeq.pcr-free.35x.trimmed_R1.fastq.gz \
    --fastq-in /input/HG002.NovaSeq.pcr-free.35x.trimmed_R2.fastq.gz \
    > /fs/scratch/PAS2525/HG002.bam

Does anyone know why I am unable to add the read group? I have also tried the following string without success:

--read-group '@RG\tID:1\tSM:HG002\tLB:lib1\tPL:illumina\tPU:unit1'

Thank you in advance for any help.

vg • 7.2k views
ADD COMMENT
0
Entering edit mode

Share the error message.

ADD REPLY
0
Entering edit mode

The read alignment does not produce any error. The BAM file does not include the read group. I need to run picard AddOrReplaceReadGroups to add the read group.

ADD REPLY
2
Entering edit mode
7 days ago
saruman ▴ 10

It seems that using the following format fixes the problem:

--read-group "1 SM:HG002 LB:lib1 PL:illumina PU:unit1"

As shown below:

@HD VN:1.5  SO:coordinate
@SQ SN:chr10    LN:134758134    M5:fbf3087282695f275d460157fc7864a5
@SQ SN:chr11    LN:135127769    M5:7f144a0b4800c341a4b4f1518838cc59
@SQ SN:chr12    LN:133324548    M5:5086cf368e83b9cefd7bb19ef0f40d3c
@SQ SN:chr13    LN:113566686    M5:6f9949b26d5ece38c6cdefd48a516907
@SQ SN:chr14    LN:101161492    M5:7ad36d43a1ab059e190ab918f27cb5bf
@SQ SN:chr15    LN:99753195 M5:914e0733351aaa9a4354ca8efb527401
@SQ SN:chr16    LN:96330374 M5:fcc04e14d89b12c757d56c177261e248
@SQ SN:chr17    LN:84276897 M5:8627043bd24255e9f356e0c99ece251f
@SQ SN:chr18    LN:80542538 M5:b6a85aabc31b2d57f707aa679a8612c5
@SQ SN:chr19    LN:61707364 M5:c2c74cb0832b34b9e8eebf4b744a88c1
@SQ SN:chr1 LN:248387328    M5:e469247288ceb332aee524caec92bb22
@SQ SN:chr20    LN:66210255 M5:f1a31f30d3da50893470ff47bf0b5a38
@SQ SN:chr21    LN:45090682 M5:760794bbbe965d4d1e3884bf09d1f306
@SQ SN:chr22    LN:51324926 M5:0068189b6b7c95e860cce0815662293f
@SQ SN:chr2 LN:242696752    M5:e5ff97665b6025191d4d7dfa2cec4cf0
@SQ SN:chr3 LN:201105948    M5:0012f8a55c07f0b61e199736105e7257
@SQ SN:chr4 LN:193574945    M5:b7e17a4f043aab5bc45cdfd5cccf9deb
@SQ SN:chr5 LN:182045439    M5:e069fd69dce480b348767fa9827671b4
@SQ SN:chr6 LN:172126628    M5:ae714983adbd673c7c84b0b1daa91f7e
@SQ SN:chr7 LN:160567428    M5:3ff4435e65cf8cdd69a3150d09dd649f
@SQ SN:chr8 LN:146259331    M5:29aeec58cb54d9132e2283839f9fcc9c
@SQ SN:chr9 LN:150617247    M5:419dad1fb58ae389ad8828536b5d78cd
@SQ SN:chrM LN:16569    M5:c68f52674c9fb33aef52dcf399755519
@SQ SN:chrX LN:154259566    M5:1f2616c4cfa30104517e44dd3c426453
@SQ SN:chrY LN:62460029 M5:dd7264df17e7e4a4dac5b0f1f19dcfe0
@RG ID:1 LB:lib1 SM:HG002 PL:illumina PU:unit1
@PG ID:0    PN:vg
@PG ID:samtools PN:samtools PP:0    VN:1.22.1   CL:samtools reheader --command sed -E "s/[A-Za-z0-9_]+#0+#//g" -
ADD COMMENT
1
Entering edit mode

I didn't find any documentation related to the formatting, but it seems vg only supports strings in the following format. The BWA style you used earlier may have been considered invalid because of the \t character.

ID:1 LB:lib1 SM:sample01 PL:illumina PU:unit1
ADD REPLY
0
Entering edit mode

Yes, it seems so. Thank you.

ADD REPLY
0
Entering edit mode

I think this is still not quite right because I think your tags need to be tab-separated in the output to really be parseable, and here they are space-separated. I think this will read as a read group with ID 1 LB:lib1 SM:HG002 PL:illumina PU:unit1, because only the ID field is required in an @RG tag and spaces are allowed in values per the spec.

A lot of times when we use Giraffe we also are using command lines that inject garbled read group info, where it looks like we meant to be setting tags we don't really set, so there are non-functional examples coming from the vg authors. I think Giraffe supports setting the ID of the read group but not really the other relevant tags, and it won't interpret \t escape sequences itself to help you inject other tags.

You could try using printf to include literal tab characters in the input. I think there's a Giraffe bug report/feature request that needs to be opened on this topic, but I'm not sure what the behavior people want really is.

ADD REPLY

Login before adding your answer.

Traffic: 3419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6