Question: Merge .bam files by groups of lanes
2
gravatar for elb
4 months ago by
elb160
Torino
elb160 wrote:

Hi guys, I have a folder with around 300 .bam files. Each .bam file is a lane of a sample and hence 4 lanes make a sample. I would like to merge the .bam files of the four lanes in a single one by _*S*_ where S is followed by a number that represent the number of the sample (e.g. my_experimet_xxx__L001_S1_stimulated_Aligned.bam). Suppose I have 75 samples, i.e. ${1...75}.

Can anyone help me please?

The line I use to merge normally is the following:

samtools merge S1_merged.bam *bam

Thank you in advance

rna-seq samtools bam • 266 views
ADD COMMENTlink modified 4 months ago by WouterDeCoster35k • written 4 months ago by elb160
1

This is not a job posting. The "job" tage should be used for jobs like career jobs. Most posts are about a computational "job" so I think this is implicit. Just use tags related to the type of computational job you want help with.

ADD REPLYlink written 4 months ago by drkennetz360

drkennetz is correct. Plus, the type says "Job Ad", not "job". Please be more mindful in the future, elb.

ADD REPLYlink written 4 months ago by RamRS19k

*bam will select all files, including your output bam. Maybe use a different glob pattern or the -b option?

ADD REPLYlink written 4 months ago by RamRS19k

Try on few files ( edit seq accordingly):

 $ parallel --dry-run 'samtools merge  my_experimet_xxx___S{}_stimulated_Aligned.bam  my_experimet_xxx__L00{1..4}_S{}_stimulated_Aligned.bam' ::: $(seq 1 75)

input format:

output format: my_experimet_xxx___S{1..75}_stimulated_Aligned.bam

input format: my_experimet_xxx__L00{1..4}_S{1..75}_stimulated_Aligned.bam

First check if samtools supports bash string extension on your machine: something like: samtools merge my_experimet_xxx___S75_stimulated_Aligned.bam my_experimet_xxx__L00{1..4}_S75_stimulated_Aligned.bam

ADD REPLYlink modified 4 months ago • written 4 months ago by cpad011210k
4
gravatar for drkennetz
4 months ago by
drkennetz360
drkennetz360 wrote:

I think this should work for your issue:

create a file name samtools_merge.sh

$mkdir merged

for L1 in *_L001_*.bam
do
    echo $L1
    L2=`echo $L1 | sed 's/_L001_/_L002_/'`
    L3=`echo $L1 | sed 's/_L001_/_L003_/'`
    L4=`echo $L1 | sed 's/_L001_/_L004_/'`
    merged=`echo $L1 | sed 's/_L001_/_merged_/'`
    samtools merge ./merged/${merged} ${L1} ${L2} ${L3} ${L4}
done

This will iterate over each unique sample with L001 somewhere in the name and store other variables by replacing L001 with L002,003,004, and do this for each sample. Then it will run samtools merge on all 4 lanes, then do the same for the next sample until it has gone through all the samples. The filename output will be the same as the sample name, but will substitute lane information with "merged".

Just run this in your directory with all the bams and you should have merged bams in the dir "merged".

ADD COMMENTlink modified 4 months ago • written 4 months ago by drkennetz360

It is fantastic! It works perfectly. Thank you very very much!

ADD REPLYlink written 4 months ago by elb160

I am glad to hear that! It was untested so you never know.

ADD REPLYlink written 4 months ago by drkennetz360

Should have mentioned the "untested" part in the post, drkennetz.

ADD REPLYlink written 4 months ago by RamRS19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1156 users visited in the last hour