Question: samtools merge iterate through multiple chromosomes and generates pairs
0
gravatar for s1661753
10 days ago by
s16617530
s16617530 wrote:

I have .bam files from a human whole genome sequence that I've had to separate into individual chromosomes (1-22, X, Y) to make another program run. I am now wanting to merge chromosomes so that I have every paired possible combination e.g. chr1_chr2, chr1_chr3... chr1_chrY and finally chrX_chrY. At the moment, each individual chromosome and its index are in the same directory.

to make 1 pair, I know that the command is as follows (taken from samtools manual):

samtools merge output.bam input1.bam input2.bam

e.g.

samtools merge chr1_chr2.bam chr1.bam chr2.bam

But how can I iterate through every possible combination of pairs, rather than writing each line of code (~ 300 lines of samtools merge ...) I don't need pairs to repeat, for example chr1 can pair with chr2:22,X,Y but chr2 doesn't need to then merge again with chr1 (or can this not be avoided/ delete later?)

I've got as far as

for file in *_chr*.bam; do samtools merge merge_${file}.bam $file $file; done

I understand that this will find all of my chromosome .bam files, then merge but this isn't going to iterate through every combination is it?

ADD COMMENTlink modified 9 days ago by Ram15k • written 10 days ago by s16617530

use two loops instead of one....

ADD REPLYlink written 10 days ago by Pierre Lindenbaum107k

thanks! Very new to unix and other computational languages... any tips on how to go about that... ? ...?

ADD REPLYlink written 9 days ago by s16617530
2

https://www.google.com/search?q=bash+loop+tutorial

ADD REPLYlink written 9 days ago by Pierre Lindenbaum107k

Why do you need pairs?

Do you realise you can use

samtools merge output.bam *.bam

And get all files merged?

ADD REPLYlink written 10 days ago by WouterDeCoster28k

Because a package I'm using downstream is eventually going to iterate through each pair. Alternatively I could keep the data as one large .bam file but its too large andexits before completing. This later package has supposedly been written specifically for whole genome sequencing analysis but doesn't write anything to file, rather it does everything in memory, reaches capacity then exits.

ADD REPLYlink modified 9 days ago • written 9 days ago by s16617530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 665 users visited in the last hour