How to extract specific region from multiple bam files and merge the outputs into single bam?
0
0
Entering edit mode
5 months ago
Vasu ▴ 770

I have more than 50 bam files and wanted to extract the reads from a region from all those bam files and merge the outputs into a single bam.

I'm using samtools for this:

samtools view -b sample1.bam "Chr10:19000-46500" > sample1_intregion.bam

I tried doing like below for all bam files:

#!/bin/bash
for f in *.bam
do
    samtools view -b $f "Chr10:19000-46500" > $f_intregion.bam
done

it is not working and doesn't have the output named above. And how can I include the command into above loop for also merging all the output bam files into single bam?

bam rna-seq • 1.2k views
ADD COMMENT
0
Entering edit mode

It is not working and doesn't have the output named above

What is not working? Do you get an error?

Instead of

samtools view -b $f "Chr10:19000-46500" > $f_intregion.bam

use

samtools view -b $f "Chr10:19000-46500" -o $f_extract.bam

You could merge the extracted BAM files after all extractions are complete using samtools merge.

Doing $f_intregion.bam also will make the names odd like file.bam_integration.bam.

ADD REPLY
0
Entering edit mode

yes it says [main_samview] random alignment retrieval only works for indexed BAM or CRAM files.

ADD REPLY
1
Entering edit mode

NOTE: I had assumed that the pipe in block 1 below would work (without testing it at the time I wrote this up). But it does not work. samtools view extraction requires an index file and that can't be done on fly by the first step writing to STDOUT. So this operation will need to be done in two steps as far as I can tell (if someone has a better answer please post).

So index the files first (in addition to sorting them) using samtools sort --write-index.

   # this for loop not to be used
        for f in *.bam
        do
        name=$(basename ${f} .bam)
        samtools sort --write-index ${f} | samtools view -bh $f -o ${name}_extract.bam "Chr10:19000-46500" 
        done

Following should work. Ideally the sorting should be done once outside this loop.

for f in *.bam
    do
    name=$(basename ${f} .bam)
    samtools sort --write-index ${f} -o ${name}_sorted.bam 
    samtools view -bh ${name}_sorted.bam -o ${name}_extract.bam "Chr10:19000-46500" 
done

In the end samtools merge all the extracted files.

ADD REPLY
0
Entering edit mode

I already have index files. I did with samtools index

ADD REPLY
0
Entering edit mode

I tried this but have sort: unrecognized option '--write-index'

ADD REPLY
1
Entering edit mode

You must be using an old version of samtools. What do you get when you do samtools --version?

ADD REPLY
0
Entering edit mode

1.6 is the version I'm using

ADD REPLY
1
Entering edit mode

Please upgrade. v.1.6 is ancient and was released in 2017. We are currently on 1.18. (this numbering goes like 1.8,1.9,1.10,1.11 etc),

ADD REPLY
0
Entering edit mode

I got the new version and extracted the tar.gz file and added the path to bashrc and did source it but still it says like below:

bash: samtools: command not found...
Install package 'samtools' to provide command 'samtools'? [N/y]
ADD REPLY
0
Entering edit mode

Did you compile the program? You will need to add the directory which contains the new samtools to your $PATH by doing export $PATH=/new_samtools_dir:$PATH.

Be sure to confirm the samtools --version shows the new version (i.e. it is one that is seen first before your default 1.6).

ADD REPLY
0
Entering edit mode

Okay I installed the new version and it is working now. Thanq !!

But if I use the | in the loop it is not wokring. So, I removed it

ADD REPLY
0
Entering edit mode

That's not how "thank you" is spelled in a professional setting. Please write professionally.

ADD REPLY
0
Entering edit mode

the --write-index is a quite new command, and due to the various library dependencies, people don't get the latest samtools version when installing with conda

so the recommendation would be best to avoid that flag for a few years, maybe,

especially since one can index separately with ease samtools index align.bam that will always work

also for the OP, the best would be to sort their existing BAM files once and for all because sooner or later, they need the sorted BAM file, so resorting on the fly each time is not worth it.

ADD REPLY

Login before adding your answer.

Traffic: 3798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6