Question: Merge .bam files by groups of lanes
2
gravatar for elb
4 weeks ago by
elb110
Torino
elb110 wrote:

Hi guys, I have a folder with around 300 .bam files. Each .bam file is a lane of a sample and hence 4 lanes make a sample. I would like to merge the .bam files of the four lanes in a single one by _*S*_ where S is followed by a number that represent the number of the sample (e.g. my_experimet_xxx__L001_S1_stimulated_Aligned.bam). Suppose I have 75 samples, i.e. ${1...75}.

Can anyone help me please?

The line I use to merge normally is the following:

samtools merge S1_merged.bam *bam

Thank you in advance

rna-seq samtools bam • 163 views
ADD COMMENTlink modified 4 weeks ago by WouterDeCoster31k • written 4 weeks ago by elb110
1

This is not a job posting. The "job" tage should be used for jobs like career jobs. Most posts are about a computational "job" so I think this is implicit. Just use tags related to the type of computational job you want help with.

ADD REPLYlink written 4 weeks ago by drkennetz340

drkennetz is correct. Plus, the type says "Job Ad", not "job". Please be more mindful in the future, elb.

ADD REPLYlink written 4 weeks ago by Ram17k

*bam will select all files, including your output bam. Maybe use a different glob pattern or the -b option?

ADD REPLYlink written 4 weeks ago by Ram17k

Try on few files ( edit seq accordingly):

 $ parallel --dry-run 'samtools merge  my_experimet_xxx___S{}_stimulated_Aligned.bam  my_experimet_xxx__L00{1..4}_S{}_stimulated_Aligned.bam' ::: $(seq 1 75)

input format:

output format: my_experimet_xxx___S{1..75}_stimulated_Aligned.bam

input format: my_experimet_xxx__L00{1..4}_S{1..75}_stimulated_Aligned.bam

First check if samtools supports bash string extension on your machine: something like: samtools merge my_experimet_xxx___S75_stimulated_Aligned.bam my_experimet_xxx__L00{1..4}_S75_stimulated_Aligned.bam

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by cpad01128.3k
4
gravatar for drkennetz
4 weeks ago by
drkennetz340
drkennetz340 wrote:

I think this should work for your issue:

create a file name samtools_merge.sh

$mkdir merged

for L1 in *_L001_*.bam
do
    echo $L1
    L2=`echo $L1 | sed 's/_L001_/_L002_/'`
    L3=`echo $L1 | sed 's/_L001_/_L003_/'`
    L4=`echo $L1 | sed 's/_L001_/_L004_/'`
    merged=`echo $L1 | sed 's/_L001_/_merged_/'`
    samtools merge ./merged/${merged} ${L1} ${L2} ${L3} ${L4}
done

This will iterate over each unique sample with L001 somewhere in the name and store other variables by replacing L001 with L002,003,004, and do this for each sample. Then it will run samtools merge on all 4 lanes, then do the same for the next sample until it has gone through all the samples. The filename output will be the same as the sample name, but will substitute lane information with "merged".

Just run this in your directory with all the bams and you should have merged bams in the dir "merged".

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by drkennetz340

It is fantastic! It works perfectly. Thank you very very much!

ADD REPLYlink written 4 weeks ago by elb110

I am glad to hear that! It was untested so you never know.

ADD REPLYlink written 4 weeks ago by drkennetz340

Should have mentioned the "untested" part in the post, drkennetz.

ADD REPLYlink written 4 weeks ago by Ram17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1561 users visited in the last hour