Adding read group (@RG) information when there are multiple ID's.
1
2
Entering edit mode
3.3 years ago
halo22 ▴ 300

Hello All,

I am trying to reanalyze a WGS dataset that was generated a few years ago. I've access to the old BAM files and I was able to create paired fastq's for each sample. Since I'll be using picard for marking duplicates I would like to add the read group information at the time of aligning my fastq's with BWA mem. From the old bam file, I was able to get lines matching the '@RG' lines. But there seems to be multiple read group ID present in the bam file. From the BWA documentation, it seems that the correct way of adding the read group info is by bwa mem -R '@RG\tID:foo\tSM:bar\tLB:library1'. I believe here both foo and bar are unique for each sample. For my particular case, how should multiple read group info to BWA? Hope the question is clear. I've very limited experience with WGS. I appreciate all your help and comments.

    samtools view -H sampleA.bam | grep '^@RG' 
 @RG     ID:AVKMG.3      SM:sampleA      LB:0993462810_Illumina  PL:ILLUMINA     PU:AVKMGDSXX191015.3.GTCCACAG-CGCGAATA  CN:BI   DT:2016-10-15T04:00:00+0000     DS:KS-9108

 @RG     ID:AJJMK.4      SM:sampleA      LB:0993462810_Illumina  PL:ILLUMINA     PU:AJJMKDSXX191014.4.GTCCACAG-CGCGAATA  CN:BI   DT:2016-10-14T04:00:00+0000     DS:KS-9108

 @RG     ID:AKKMD.4      SM:sampleA      LB:0993462810_Illumina  PL:ILLUMINA     PU:AKKMDDSXX191014.4.GTCCACAG-CGCGAATA  CN:BI   DT:2016-10-14T04:00:00+0000     DS:KS-9108

 @RG     ID:UGGMD.4      SM:sampleA      LB:0993462810_Illumina  PL:ILLUMINA     PU:UGGMDDSXX191014.4.GTCCACAG-CGCGAATA  CN:BI   DT:2016-10-14T04:00:00+0000     DS:KS-9108
next-gen alignment • 1.4k views
ADD COMMENT
2
Entering edit mode
3.3 years ago

Most likely the alignments have been done separately and bam files merged afterwards.

ADD COMMENT
0
Entering edit mode

Thank you very much. I do have a follow-up question and would appreciate it if you could answer this. I used picard AddreadGroup function and added a single read group, the first in the above example to the aligned BAM file. (All library LB are the same in the above example). I was successfully able to markduplicates. Do you think this is the correct approach? Does 'ID' have an impact on the markduplication process?

ADD REPLY
0
Entering edit mode

This seems sufficiently distinct from the original question for you to open a new post.

ADD REPLY

Login before adding your answer.

Traffic: 3063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6