Difference Between Picards Mergebamalignment And Mergesamfiles
Entering edit mode
11.3 years ago
Johan ▴ 890

Could someone please help me figure out the difference between Picards MergeBamAlignment and MergeSamFiles. Their respective documentations state:


USAGE: MergeBamAlignment [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#MergeBamAlignment

Merges alignment data from a SAM or BAM file with additional data stored in an unmapped BAM file and produces a third SAM or BAM file of aligned and unaligned reads. NOTE that this program expects to find a sequence dictionary in the same directory as REFERENCE_SEQUENCE and expects it to have the same base name as the reference fasta except with the extension '.dict'



Merges multiple SAM/BAM files into one file.

I found this post: A: Merging Bam Files here on BioStars, but that does not describe the difference between the two. Except from the MergeBamAlignment adding in additional unmapped data, I just don't get the difference. And suppose I have a use case where I would like to merge multiple bam files with alignments of the same sample (but different reads), which of the two should I use?


picard • 9.8k views
Entering edit mode
11.3 years ago
Ying W ★ 4.2k

The thing to keep in mind here is that some people have sam/bam files with no alignment information. (so its kinda like a fastq in this sense except it has a header). When merging to sam/bam files with alignment information (say two replicates that you aligned separately and want to merge) you should probably use but after running this command make sure you check the header for duplications. The MergeBamAlignment command looks to me like a specialty tool made to add information as an unaligned bam to an already aligned bam file.

Entering edit mode
7.1 years ago
Carlos Borroto ★ 2.1k

Long detailed explanation from the GATK forum.

3C. Restore altered data and apply & adjust meta information using MergeBamAlignment

MergeBamAlignment is a beast of a tool, so its introduction is longer. It does more than is implied by its name. Explaining these features requires I fill you in on some background.

Broadly, the tool merges defined information from the unmapped BAM (uBAM, step 1) with that of the aligned BAM (step 3) to conserve read data, e.g. original read information and base quality scores. The tool also generates additional meta information based on the information generated by the aligner, which may alter aligner-generated designations, e.g. mate information and secondary alignment flags. The tool then makes adjustments so that all meta information is congruent, e.g. read and mate strand information based on proper mate designations. We ascribe the resulting BAM as clean.

Specifically, the aligned BAM generated in step 3 lacks read group information and certain tags--the UQ (Phred likelihood of the segment), MC (CIGAR string for mate) and MQ (mapping quality of mate) tags. It has hard-clipped sequences from split reads and altered base qualities. The reads also have what some call mapping artifacts but what are really just features we should not expect from our aligner. For example, the meta information so far does not consider whether pairs are optimally mapped and whether a mate is unmapped (in reality or for accounting purposes). Depending on these assignments, MergeBamAlignment adjusts the read and read mate strand orientations for reads in a proper pair. Finally, the alignment records are sorted by query name. We would like to fix all of these issues before taking our data to a variant discovery workflow.

Enter MergeBamAlignment. As the tool name implies, MergeBamAlignment applies read group information from the uBAM and retains the program group information from the aligned BAM. In restoring original sequences, the tool adjusts CIGAR strings from hard-clipped to soft-clipped. If the alignment file is missing reads present in the unaligned file, then these are retained as unmapped records. Additionally, MergeBamAlignment evaluates primary alignment designations according to a user-specified strategy, e.g. for optimal mate pair mapping, and changes secondary alignment and mate unmapped flags based on its calculations. Additional for desired congruency. I will soon explain these and additional changes in more detail and show a read record to illustrate.

Updating old post in case you land here from Google like I just did.


Login before adding your answer.

Traffic: 2747 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6