Question: samtools merge iterate through multiple chromosomes and generates pairs
gravatar for s1661753
8 months ago by
s16617530 wrote:

I have .bam files from a human whole genome sequence that I've had to separate into individual chromosomes (1-22, X, Y) to make another program run. I am now wanting to merge chromosomes so that I have every paired possible combination e.g. chr1_chr2, chr1_chr3... chr1_chrY and finally chrX_chrY. At the moment, each individual chromosome and its index are in the same directory.

to make 1 pair, I know that the command is as follows (taken from samtools manual):

samtools merge output.bam input1.bam input2.bam


samtools merge chr1_chr2.bam chr1.bam chr2.bam

But how can I iterate through every possible combination of pairs, rather than writing each line of code (~ 300 lines of samtools merge ...) I don't need pairs to repeat, for example chr1 can pair with chr2:22,X,Y but chr2 doesn't need to then merge again with chr1 (or can this not be avoided/ delete later?)

I've got as far as

for file in *_chr*.bam; do samtools merge merge_${file}.bam $file $file; done

I understand that this will find all of my chromosome .bam files, then merge but this isn't going to iterate through every combination is it?

ADD COMMENTlink modified 8 months ago by RamRS20k • written 8 months ago by s16617530

use two loops instead of one....

ADD REPLYlink written 8 months ago by Pierre Lindenbaum116k

thanks! Very new to unix and other computational languages... any tips on how to go about that... ? ...?

ADD REPLYlink written 8 months ago by s16617530

ADD REPLYlink written 8 months ago by Pierre Lindenbaum116k

Why do you need pairs?

Do you realise you can use

samtools merge output.bam *.bam

And get all files merged?

ADD REPLYlink written 8 months ago by WouterDeCoster36k

Because a package I'm using downstream is eventually going to iterate through each pair. Alternatively I could keep the data as one large .bam file but its too large andexits before completing. This later package has supposedly been written specifically for whole genome sequencing analysis but doesn't write anything to file, rather it does everything in memory, reaches capacity then exits.

ADD REPLYlink modified 8 months ago • written 8 months ago by s16617530
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 550 users visited in the last hour