Question: Merge .bam files by groups of lanes
2
gravatar for elb
3 months ago by
elb140
Torino
elb140 wrote:

Hi guys, I have a folder with around 300 .bam files. Each .bam file is a lane of a sample and hence 4 lanes make a sample. I would like to merge the .bam files of the four lanes in a single one by _*S*_ where S is followed by a number that represent the number of the sample (e.g. my_experimet_xxx__L001_S1_stimulated_Aligned.bam). Suppose I have 75 samples, i.e. ${1...75}.

Can anyone help me please?

The line I use to merge normally is the following:

samtools merge S1_merged.bam *bam

Thank you in advance

rna-seq samtools bam • 216 views
ADD COMMENTlink modified 3 months ago by WouterDeCoster32k • written 3 months ago by elb140
1

This is not a job posting. The "job" tage should be used for jobs like career jobs. Most posts are about a computational "job" so I think this is implicit. Just use tags related to the type of computational job you want help with.

ADD REPLYlink written 3 months ago by drkennetz350

drkennetz is correct. Plus, the type says "Job Ad", not "job". Please be more mindful in the future, elb.

ADD REPLYlink written 3 months ago by RamRS18k

*bam will select all files, including your output bam. Maybe use a different glob pattern or the -b option?

ADD REPLYlink written 3 months ago by RamRS18k

Try on few files ( edit seq accordingly):

 $ parallel --dry-run 'samtools merge  my_experimet_xxx___S{}_stimulated_Aligned.bam  my_experimet_xxx__L00{1..4}_S{}_stimulated_Aligned.bam' ::: $(seq 1 75)

input format:

output format: my_experimet_xxx___S{1..75}_stimulated_Aligned.bam

input format: my_experimet_xxx__L00{1..4}_S{1..75}_stimulated_Aligned.bam

First check if samtools supports bash string extension on your machine: something like: samtools merge my_experimet_xxx___S75_stimulated_Aligned.bam my_experimet_xxx__L00{1..4}_S75_stimulated_Aligned.bam

ADD REPLYlink modified 3 months ago • written 3 months ago by cpad01129.4k
4
gravatar for drkennetz
3 months ago by
drkennetz350
drkennetz350 wrote:

I think this should work for your issue:

create a file name samtools_merge.sh

$mkdir merged

for L1 in *_L001_*.bam
do
    echo $L1
    L2=`echo $L1 | sed 's/_L001_/_L002_/'`
    L3=`echo $L1 | sed 's/_L001_/_L003_/'`
    L4=`echo $L1 | sed 's/_L001_/_L004_/'`
    merged=`echo $L1 | sed 's/_L001_/_merged_/'`
    samtools merge ./merged/${merged} ${L1} ${L2} ${L3} ${L4}
done

This will iterate over each unique sample with L001 somewhere in the name and store other variables by replacing L001 with L002,003,004, and do this for each sample. Then it will run samtools merge on all 4 lanes, then do the same for the next sample until it has gone through all the samples. The filename output will be the same as the sample name, but will substitute lane information with "merged".

Just run this in your directory with all the bams and you should have merged bams in the dir "merged".

ADD COMMENTlink modified 3 months ago • written 3 months ago by drkennetz350

It is fantastic! It works perfectly. Thank you very very much!

ADD REPLYlink written 3 months ago by elb140

I am glad to hear that! It was untested so you never know.

ADD REPLYlink written 3 months ago by drkennetz350

Should have mentioned the "untested" part in the post, drkennetz.

ADD REPLYlink written 3 months ago by RamRS18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour