from .BAM to .BAI using samtools
7
15
Entering edit mode
10.0 years ago
Kizuna ▴ 870

Hi !

I have many .bam that I want to get their .bai using samtools in the terminal.

I tried the following command :

samtools index *.bam

However, I did not get any .bai file.

Regards

next-gen-sequencing samtools • 141k views
ADD COMMENT
21
Entering edit mode
10.0 years ago

UPDATE 2024:

samtools index supports the syntax

samtools index -M [-bc] [-m INT] <in1.bam> <in2.bam>...

using GNU parallel:

parallel  samtools index ::: *.bam
ADD COMMENT
0
Entering edit mode

thanks this is useful. Do you know what is the optimal way of setting -j ? Without setting it does it just default to max cores? Or should I set it to something total core minus 1. That is if I have 8 then I will just set it to 7? thanks.

ADD REPLY
0
Entering edit mode

From the manual:

0 means as many as possible. Default is 100% which will run one job per CPU on each machine.

So you shouldn't normally have to worry about it unless, for example, you want to constrain the number of jobs to be less than your number of CPU cores, or you want two jobs per CPU core.

ADD REPLY
7
Entering edit mode
10.0 years ago
donfreed ★ 1.6k

Samtools index only accepts a single input file, so using a shell metacharacter to specify multiple files will not work. I usually use a shell wrapper to run samtools index on a single file at a time.

#!/usr/bin/env bash

# index_all.sh

for INFILE in "$@"
do
   samtools index $INFILE
done

Then it is simple to run:

./index_all.sh /directory/*.bam
ADD COMMENT
8
Entering edit mode
10.0 years ago

You need to point the results to a file to create this:

So for one file it would be

samtools index file.bam  file.bam.bai

See this link for a great description.

In your case with many bam files I would do it in a shell script as follows:

#!/bin/bash
for I in *.bam
do
  echo "Indexing: "$i        
  samtools index $i $i".bai"
done
ADD COMMENT
3
Entering edit mode

The samtools index foo.bam foo.bam.bai syntax won't work with the two most recent versions of samtools. The way to do this is simply samtools index foo.bam. Enabling people to specify alternate index filenames is low on the priority list.

ADD REPLY
0
Entering edit mode

@Devon Ryan: Yes indeed, with my samtools version I need to write samtools index foo.bam

So if I understood well , You are saying that the *.bam won't work with my samtools version?

ADD REPLY
4
Entering edit mode

*.bam won't work with any samtools version. I should note that understanding why this is the case will be helpful for you in general (this will be the case for many tools), so I'll explain that.

Assume you're in a directory with three BAM files: A.bam, B.bam and C.bam. When you type samtools index *.bam, your shell sees *.bam and expands it. Consequently, what samtools sees you as running is samtools index A.bam B.bam C.bam. That'd be fine if samtools index could accept more than one input file at a time, but it can't. Further, it may either then see you as using the alternate syntax that Jonathan Crowther mentioned or simply die due to not knowing what to do (I'd have to check the source code, though I expect the former would happen.

So, what you want to do is simply iterate over the files with a for loop:

for f in *.bam
do
    samtools index $f
done
ADD REPLY
0
Entering edit mode

Very helpful Devon Ryan Tx!

ADD REPLY
0
Entering edit mode

@Jonathan Crowther : Tx! I will give it a try

ADD REPLY
4
Entering edit mode
5.3 years ago
eidriangm ▴ 60

bash one-liner loop:

# Considering that you are inside the folder where the bams are: for bam in $(ls *.bam); do samtools index $bam; done

ADD COMMENT
1
Entering edit mode

This was the only solution working for me on WSL Ubuntu working on files that are stored in Windows folder! Donfreeds solution gave me "samtools-command not found error" and Pierre Lindenbaum's solution didn't end in my case... doing nothing. Thanks for posting!

ADD REPLY
0
Entering edit mode

this works fine given that parallel installed " parallel samtools index ::: *.bam" as @Pierre Lindenbaum answered

ADD REPLY
3
Entering edit mode
15 months ago

Starting in samtools 1.16, you can use

samtools index -M *.bam
ADD COMMENT
1
Entering edit mode
4.7 years ago

Convert and sort sam to bam

samtools view -bS file.sam | samtools sort -o file.bam

Create an index for the sorted bam

samtools index file.bam file.bai

Count mapped reads

samtools view -bS file.sam | cut -f1 | sort | uniq | wc -l
2
Entering edit mode

Hi Waldeyr Mendes Cordeiro da Silva,

thanks for contributing. Please be sure though that you do not repost answers.

This answer has already been given 5 times in this thread including the accepted one. Also note that current samtools versions do not require a view command prior to sort. sort will happily read SAM and output BAM, please check the manual for details (-O BAM option and automatic format detection based on output file suffix with -o).

Also the read counting is not correct. First the view will output a BAM file so a binary format that is not compatible with cut etc. commands. Also even if it was a SAM file it would count the header (if you print it via samtools view -h) but in any case it counts all reads (= also unmapped ones) so the result is not reliable. Use samtools flagstat instead which is specialized code for exactly what you want to do. This works both on SAM/BAM/CRAM format.

ADD REPLY
0
Entering edit mode
15 months ago
wcychan • 0

use find and xargs

find . -name "*.bam" | xargs -n1 samtools index
ADD COMMENT

Login before adding your answer.

Traffic: 1751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6