Question

Adding read group (@RG) information when there are multiple ID's.

2

Entering edit mode

3.3 years ago

halo22 ▴ 300

Hello All,

I am trying to reanalyze a WGS dataset that was generated a few years ago. I've access to the old BAM files and I was able to create paired fastq's for each sample. Since I'll be using picard for marking duplicates I would like to add the read group information at the time of aligning my fastq's with BWA mem. From the old bam file, I was able to get lines matching the '@RG' lines. But there seems to be multiple read group ID present in the bam file. From the BWA documentation, it seems that the correct way of adding the read group info is by bwa mem -R '@RG\tID:foo\tSM:bar\tLB:library1'. I believe here both foo and bar are unique for each sample. For my particular case, how should multiple read group info to BWA? Hope the question is clear. I've very limited experience with WGS. I appreciate all your help and comments.

    samtools view -H sampleA.bam | grep '^@RG' 
 @RG     ID:AVKMG.3      SM:sampleA      LB:0993462810_Illumina  PL:ILLUMINA     PU:AVKMGDSXX191015.3.GTCCACAG-CGCGAATA  CN:BI   DT:2016-10-15T04:00:00+0000     DS:KS-9108

 @RG     ID:AJJMK.4      SM:sampleA      LB:0993462810_Illumina  PL:ILLUMINA     PU:AJJMKDSXX191014.4.GTCCACAG-CGCGAATA  CN:BI   DT:2016-10-14T04:00:00+0000     DS:KS-9108

 @RG     ID:AKKMD.4      SM:sampleA      LB:0993462810_Illumina  PL:ILLUMINA     PU:AKKMDDSXX191014.4.GTCCACAG-CGCGAATA  CN:BI   DT:2016-10-14T04:00:00+0000     DS:KS-9108

 @RG     ID:UGGMD.4      SM:sampleA      LB:0993462810_Illumina  PL:ILLUMINA     PU:UGGMDDSXX191014.4.GTCCACAG-CGCGAATA  CN:BI   DT:2016-10-14T04:00:00+0000     DS:KS-9108

next-gen alignment • 1.4k views

ADD COMMENT • link updated 3.3 years ago by WouterDeCoster 47k • written 3.3 years ago by halo22 ▴ 300

score 2 · Accepted Answer · 2020-12-29

2

Entering edit mode

3.3 years ago

WouterDeCoster 47k

Most likely the alignments have been done separately and bam files merged afterwards.

ADD COMMENT • link 3.3 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you very much. I do have a follow-up question and would appreciate it if you could answer this. I used picard AddreadGroup function and added a single read group, the first in the above example to the aligned BAM file. (All library LB are the same in the above example). I was successfully able to markduplicates. Do you think this is the correct approach? Does 'ID' have an impact on the markduplication process?

ADD REPLY • link 3.3 years ago by halo22 ▴ 300

0

Entering edit mode

This seems sufficiently distinct from the original question for you to open a new post.

ADD REPLY • link 3.3 years ago by WouterDeCoster 47k