I've got a few thousand small bam files produced against the exact same reference, and I want to merge them into one single big bam file. What is the best way to do that?
Should I do this iteratively or can I pass a long list of bam files to samtools/picard/etc in one go?
Edited, since this is now partially solved. In my terminal, the methods below works for up to 4092 files. More than that raises an error:
samtools merge all.bam *.bam
samtools merge all.bam `find /basedir/ -name "*myfiles*.bam"`
samtools merge all.bam /basedir/*/???/*myfiles*.bam
That should get you around the 4092 files problem (which will be a command line length limit in your shell, if I understand things correctly)
Perhaps more like this:
But make sure to use backticks around the find statement. They got scrubbed from the comment for some reason
Thanks very much. I don't have them on the same dir, but this works: ~/samtools merge all.bam ~/mydirs/??/??/??/mybam.*.bam
@DocRoberson: it seems like the find is actually not needed, because the shell is already expanding the regexp, at least if it's a few thousand files.
@DocRoberson: it seems like the find is actually not needed, because the shell is already expanding the regexp. It works for up to 4092 files in my terminal.