I'm trying to merge different bam files with the same name (and, theoretically, the same sequence). They are the same set of libraries sequenced at two different times, and I want to merge the matching pairs in order to simulate greater sequencing depth.
I can do this just fine with two bams using picard MergeSameFiles
or samtools merge
, but the issue is that I have 96 bams in each folder. I'd like to do this programatically, not manually, and be able to reproduce the process with different datasets in the future.
My hunch says that the simplest way to do this would be to use the bam filenames: loop over my two folders, find the bams that share a filename, and merge each pair into a single output file, but my shell chops are still rough and I am hitting a wall.
I've been starting with getting a list of bams of interest (I've also attempted dumping this list to a file with -fprint0
):
find_bams() {
find "$run_folder" -type f -name "*.bam"
}
Then, what I think I should do is loop over the list:
for i in $(find_bams)
do
s=$(basename "$i" .bam)
picard MergeSamFiles I="$i" O="$s".bam
done
This is where I get stuck. First, there needs to be two input files to merge, and second, those two inputs must have matching filenames.
This is more of a shell scripting problem than a bioinformatics software problem, but I imagine I'm not the only one who has had to do this. Any help would be greatly appreciated.
Edit: Solved with the help of h.mon and finswimmer; see comments
People probably don't want to write this for you...why don't you share what you have already, and why it's not working?
Definitely not my intention to have people write it for me. I updated the post with what I've got. Thanks for following up.