Question: Split BAM into 1 BAM per contig
gravatar for Marvin
5.6 years ago by
Marvin170 wrote:


I have mapped reads against a reference with bwa and got a big bam file now.

I would like to retrieve a separate bam file for every contig.

I do not mean "I want 1 bam for every chromosome". A chromosome is not a contig since my coverage is 0 in many parts of the genome (aDNA reads).

In this situation e.g. (I hope he doesn't mess up when I post this):


  ---   ---                     ---   ---

    ----                          -----

I would like to get 2 bam files.

How do I do that?

sam contig • 2.3k views
ADD COMMENTlink modified 5.6 years ago by Pierre Lindenbaum133k • written 5.6 years ago by Marvin170
gravatar for Pierre Lindenbaum
5.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

use genomecov ( ) get a bed with >0 coverage. Then, loop over the bed to extract the reads.


awk '{printf("%s:%s-%s\n",$1,$2,$3);}' cov.bed | while read F; do samtools view -o "${F}.bam" -b your.bam "${F}"; done
ADD COMMENTlink written 5.6 years ago by Pierre Lindenbaum133k

Thank you! Don't know how to use that bedtools genomecov but I wil find out.

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Marvin170

I'm struggling, how do I achieve this ... !?

ADD REPLYlink written 5.6 years ago by Marvin170

I just can't figure it out. The problem is that he opens a new bed line as soon as the coverage within a contig changes. But what I need is 1 line per contig. How is that possible with a single genomecov command? I know how to do it with a pipe but from your answer it sounds like it could be done immediately?

ADD REPLYlink written 5.6 years ago by Marvin170

Oh my god I finally made it.

Thank you for helping me alot by pointing me towards the right direction (bedtools is the right choice [didn't know that program before], then awk with samtools as posted by you). However that genomecov suggestion gave me struggles because it didn't get me what I wanted ... all I need is:

bedtools merge -i aln_sorted.bam

This gives me 1 line per contig. Afterwards I can apply the awk-samtools-pipe that you have posted :)

Thanks again!

ADD REPLYlink written 5.6 years ago by Marvin170

Can you share the code you used and the files you used as input? I would also like to obtain individual bam files per contig. I tried this process posted here but I am getting zero output files after running the awk command (without genomecov).

I instead used bedtools merge -i aln_sorted.bam > mycontigs.txt then used the awk command and instead of cov.bed I used mycontigs.txt

Do I need a bed file at all for this?

ADD REPLYlink written 15 months ago by DNAngel90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1539 users visited in the last hour