Question: Can You Make A Bam File Picard Compatible? I Forgot -M With Bwa Mem
gravatar for ajc8
5.8 years ago by
University of Iowa
ajc8120 wrote:


I just aligned several exome samples with bwa mem and then processed the output through several pipeline steps. This is the first time I used bwa mem instead of aln and then sampe.

I am now ready to Mark Duplicates in Picard but after determining the source of my error message, I have realized that in order to prepare the sam file for future use in Picard (via bwa mem), I need to specify with "-M" in the mem command.

Is there a way to make a bam file okay for marking duplicates in Picard without realigning from the beginning?

Thanks for your help.

duplicates picard bwa • 4.5k views
ADD COMMENTlink modified 3.1 years ago by Matt Shirley8.9k • written 5.8 years ago by ajc8120

What's your error message?

ADD REPLYlink written 5.8 years ago by Biomonika (Noolean)3.0k
gravatar for matted
5.8 years ago by
Boston, United States
matted7.0k wrote:

Yes, you can.

As I understand it, the issue is a new flag 0x800 that is set for split alignment records, used instead of 0x100 for a secondary alignment. Neither Picard nor Samtools understand this flag (yet).

From Heng on the samtools-help mailing list:

Those on the samtools mailing list may know I have proposed to a new SAM flag 0x800 to better describe chimeric alignment. I have also proposed to standardize the XP tag as SA. The format of SA follows: "(chr,pos,strand,CIGAR,mapQ,NM;)+". Note that SA separates position and strand, slightly different from XP. Other samtools developers, including the Picard group, have seconded the changes. I will write this to the SAM spec.

The latest bwa-mem at github implements these changes. In the new output (without option -a), a read may appear in two or more SAM lines as before. But in this case, one and only one line is NOT flagged with 0x800. This line is called the "primary line" and always uses soft clipping. The rest of lines are flagged with 0x800. These lines are called "supplementary lines" and always use hard clipping. Having one primary line helps operations such as MarkDuplicates, SamToFastq and FixMateInformation.

Samtools ignores the new flag. Picard may not work with the new bwa-mem output, but it is going to. Before Picard supports the new 0x800 flag, you may still use flag "-M" as before. The only effect of "-M" is to change 0x800 to 0x100. You may also change 0x800 to 0x100 with a script if you need the compatibility with older Picard but forget to use "-M" when invoking bwa-mem.

So I think the easiest fix is to write a short script going through the file in sam format and changing the 0x800 flag to 0x100, if it's set. Or even simpler (though lossy), you could exclude reads that have the 0x800 flag set, producing a file that's compatible with current Picard tools.

Or I guess (for completeness) you could patch a local version of Picard to check for 0x800 as well as 0x100...

ADD COMMENTlink written 5.8 years ago by matted7.0k
gravatar for Matt Shirley
3.1 years ago by
Matt Shirley8.9k
Cambridge, MA
Matt Shirley8.9k wrote:

I had to do this today, and wrote a small bit of python to convert my existing alignments:

ADD COMMENTlink written 3.1 years ago by Matt Shirley8.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour