RGID mismatch after using MergeBamAlignment
2
0
Entering edit mode
5.7 years ago
John 13k

Hello all :)

I created an umapped bam with all my metadata in it, including an RGID in the format:

@RG ID:HWI-ST552.C5WGPACXX.1.AGTTCC SM:B6.BKS-Leprdb/J  LB:Mm08.WGBS    PL:illumina CN:Essen    DS:Obese

.

I then used bwa-meth to map my trimmed reads, which seems to have inserted its own RGID for every read:

HWI-ST552:160:C2J56ACXX:2:1211:16219:92907  99  chr1    3000373 44  101M    =   3000544 272 TATGTTTTTTAGTTAGTTTGGTTAAGGGTTTATTTATTTTGTTGATTTTTTTAAAGAATTAGTTATTAGTTTGGTTGATTTTTTGAATATTTTTTTTTGTT   @?@DFFFFHHHHFGHGIIJJJEHEHGIJEFHHHIJIIIJJGHH@<FGHHIGICCAA=EEECEHFHHEA@CDFFDA?BDCDDDDD?@CCEDDEEDDDDD<BC   NM:i:1  MD:Z:68T32  AS:i:98 XS:i:93 RG:Z:44_Mm01_WEAd_C2_WGBS_E_1_L002__trimmed YC:Z:CT YD:Z:f

But now, surprisingly, when i use Picard's MergeBamAlignment, even though the only ATTRIBUTES_TO_RETAIN=XS, I end up with a file with the RGID from the unmapped bam in the header, and the RGID of bwa-meth in every read. I also seem to have lost the YC and YD tags that bwa-meth puts in - i'm not sure how important these tags are (or what they are used for).

HWI-ST552:160:C2J56ACXX:2:1211:16219:92907  99  chr1    3000373 44  101M    =   3000544 272 TATGTTTTTTAGTTAGTTTGGTTAAGGGTTTATTTATTTTGTTGATTTTTTTAAAGAATTAGTTATTAGTTTGGTTGATTTTTTGAATATTTTTTTTTGTT   @?@DFFFFHHHHFGHGIIJJJEHEHGIJEFHHHIJIIIJJGHH@<FGHHIGICCAA=EEECEHFHHEA@CDFFDA?BDCDDDDD?@CCEDDEEDDDDD<BC   MC:Z:101M   MD:Z:68T32  PG:Z:bwa-meth   RG:Z:44_Mm01_WEAd_C2_WGBS_E_1_L002__trimmed NM:i:1  MQ:i:60 UQ:i:31 AS:i:98 XS:i:93

I now can't merge reads or mark duplicates, etc, until this is all fixed up.

ERROR: Record 1, Read name HWI-ST552:217:C5WGPACXX:1:1209:10866:87788, RG ID on SAMRecord not found in header: 44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed
WARNING: Read name HWI-ST552:217:C5WGPACXX:1:1209:10866:87788, A record is missing a read group

At the moment none of my file have been merged so one option would be to set the RGID of each read to the RGID in the header. What is the easiest way to do this? Also, is this a bug that I should report on Picard's Github, or is this a feature?

bwa-meth picard MergeBamAlignment • 1.7k views
ADD COMMENT
0
Entering edit mode

I still have the original bwa-meth mapped bams, and the uBAM, and the merged BAMs, so whatever is easiest to move forward I can do :)

ADD REPLY
1
Entering edit mode
5.7 years ago
igor 12k

Can you try fixing read groups with Picard AddOrReplaceReadGroups? http://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

ADD COMMENT
0
Entering edit mode

Yes, most likely - but im worried that just fixes the part of the problem that i can see - there might be other issues under the surface. Like, maybe I shouldn't be doing MergeBamAlignment in the first place. Maybe I should be keeping those YC and YD tags. Maybe this is something that should happen in the first place and Pierre will tell me I did XYZ wrong and I should do something else :)

ADD REPLY
0
Entering edit mode
5.6 years ago
John 13k

After a month of working on other stuff I came back to this today. Here's my update:

bwa-meth adds an RGID by default, which it seems that MergeBamAlignment was not able to remove for some reason - or at least, only replaced 0.7% of them. It did however remove the @RG in the header and replace it with the correct one from the unmapped BAM, which was why i was getting the errors above (reads had RGID not in the header).

Here are some stats using the new version of SeQC which supports RGID:

From the uBAM:

sqlite> select * from 'd10086649af3ca7a090206ea4340a9e5_RGID_TAGNAMES' order by TAGNAMES;
RGID                          TAGNAMES    counts    
----------------------------  ----------  ----------
HWI-ST552.C5WGPACXX.1.AGTTCC  RG          319411810 
HWI-ST552.C5WGPACXX.1.AGTTCC  RG XT       841004

Out of bwa-meth:

RGID                                     TAGNAMES                    counts    
---------------------------------------  --------------------------  ----------
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM RG SA XA XS YC YD  91659     
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM RG SA XS YC YD     544269    
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM RG XA XS YC YD     20092415  
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM RG XS YC YD        297242252 
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS RG XS YC                 2600251

(Note the sneaky XA tag i hadn't noticed by eye-balling the SAM before)

After using the incorrect MergeBamAlignment command in OP:

RGID                                     TAGNAMES                       counts    
---------------------------------------  -----------------------------  ----------
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MC MD MQ NM PG RG SA UQ XS  312097    
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MC MD MQ NM PG RG SA UQ XS  3644      
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MC MD MQ NM PG RG UQ XS     316167037 
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MC MD MQ NM PG RG UQ XS XT  836152    
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MC MD NM PG RG XS           760       
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM PG RG SA UQ XS        316519    
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM PG RG SA UQ XS XT     3668      
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM PG RG UQ XS           330244    
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM PG RG UQ XS XT        440       
44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed  AS MD NM PG RG XS              34        
HWI-ST552.C5WGPACXX.1.AGTTCC             MC PG RG                       331627    
HWI-ST552.C5WGPACXX.1.AGTTCC             MC PG RG XT                    452       
HWI-ST552.C5WGPACXX.1.AGTTCC             PG RG                          2267868   
HWI-ST552.C5WGPACXX.1.AGTTCC             PG RG XT                       304

So it seems bwa-meth has 4 tags MergeBamAlignment needs to know about - XA, XS, YC and YD - and we need to specify that we really want picard to drop all the RGID tags and replace them. So the new command looks like:

java -Xmx15G -jar picard.jar MergeBamAlignment
    UNMAPPED_BAM=/path/to/unmapped.bam
    ALIGNED_BAM=/path/to/bwa-meth/output.bam
    OUTPUT=/merged.bam
    R=/mm10.fa
    CREATE_INDEX=true
    ADD_MATE_CIGAR=true
    CLIP_ADAPTERS=false
    CLIP_OVERLAPPING_READS=true
    INCLUDE_SECONDARY_ALIGNMENTS=true
    MAX_INSERTIONS_OR_DELETIONS=-1
    PRIMARY_ALIGNMENT_STRATEGY=MostDistant
    IS_BISULFITE_SEQUENCE=true
    ATTRIBUTES_TO_RETAIN=XA
    ATTRIBUTES_TO_RETAIN=XS
    ATTRIBUTES_TO_RETAIN=YC
    ATTRIBUTES_TO_RETAIN=YD
    ATTRIBUTES_TO_REMOVE=RG
ADD COMMENT
0
Entering edit mode

Final output:

RGID                          TAGNAMES                                   counts    
----------------------------  -----------------------------------------  ----------
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD MQ NM PG RG SA UQ XA XS XT YC YD  371       
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD MQ NM PG RG SA UQ XA XS YC YD     45004     
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD MQ NM PG RG SA UQ XS XT YC YD     3273      
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD MQ NM PG RG SA UQ XS YC YD        267093    
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD MQ NM PG RG UQ XA XS XT YC YD     58010     
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD MQ NM PG RG UQ XA XS YC YD        20004810  
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD MQ NM PG RG UQ XS XT YC YD        778142    
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD MQ NM PG RG UQ XS YC YD           296162227 
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD NM PG RG XA XS YC YD              218       
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MC MD NM PG RG XS YC YD                 542       
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG SA UQ XA XS XT YC YD        404       
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG SA UQ XA XS YC YD           45880     
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG SA UQ XS XT YC YD           3264      
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG SA UQ XS YC YD              270639    
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG UQ XA XS XT YC YD           32        
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG UQ XA XS YC YD              29324     
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG UQ XS XT YC YD              408       
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG UQ XS YC YD                 300920    
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG XA XS YC YD                 21        
HWI-ST552.C5WGPACXX.1.AGTTCC  AS MD NM PG RG XS YC YD                    13        
HWI-ST552.C5WGPACXX.1.AGTTCC  MC PG RG                                   331627    
HWI-ST552.C5WGPACXX.1.AGTTCC  MC PG RG XT                                452       
HWI-ST552.C5WGPACXX.1.AGTTCC  PG RG                                      2267868   
HWI-ST552.C5WGPACXX.1.AGTTCC  PG RG XT                                   304
ADD REPLY

Login before adding your answer.

Traffic: 2371 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6