Multiple Bam To Multiple Sam
3
4
Entering edit mode
9.7 years ago
venks ▴ 730

Hi,

I have 100's of BAM files in a directory. Is there a way with which we can convert all the BAM files in a directory into Sam files. I know that we can use

samtools view -h <path/.bam> > <path/.sam>

to convert individual files.

Thanks

bam samtools bioinformatics sequencing • 5.7k views
ADD COMMENT
8
Entering edit mode
9.7 years ago
matted 7.6k

This is more of a [shell] scripting question. There are a lot of ways to do this, of varying elegance. Here's one try:

ls *bam | xargs -I {} -n 1 samtools view -h {} -o {}.sam

This will call this command on every bam file in the directory:

samtools view -h in.bam -o in.bam.sam

Out of curiosity I have to ask why you want to do this... I can't imagine any good reasons to leave the binary format behind, unless you're trying to fill up your drives.

ADD COMMENT
1
Entering edit mode

I am planning to use htseq-count to generate reads per gene. Do U think there is another efficient way of doing this.

Thanks in Advance

ADD REPLY
1
Entering edit mode

There are a lot of tools that do this; you can search this forum for others. Extracting Read Count For Each Gene/Exon From Rna-Seq Bam Files discussion a few options, including HTseq. It also discusses bedtools multicov, which might be better for you since it works on BAMs natively and can handle multiple at once. If you want to use HTseq, since it looks like it doesn't support BAMs directly, I would pipe things in instead of writing to disk:

samtools view -h in.bam | htseq-count /dev/stdin features.gff
ADD REPLY
1
Entering edit mode

Sounds Good and Thank You very much for the suggestion

ADD REPLY
0
Entering edit mode

Hi matted,

An extension to the original question, how would one pipe the data if we need to sort the bam file first then use htseq-count. For example I tried

samtools view -h in.bam | samtools sort /dev/stdin /dev/stdout | htseq-count -m intersection-strict -s no /dev/stdin /eatures.gtf > test.count

which does not seem to work..and gives the error [sort_blocks] fail to create file /dev/stdout.bam.

Any ideas would be much appreciated! many thanks

ADD REPLY
0
Entering edit mode

you need to teel "view" to output BAM (option -bu ) , not SAM. "sort" needs a file suffix , not a stream/device ( /dev/stdout), replace /dev/stdin by '-'.

ADD REPLY
0
Entering edit mode

Exactly. Also note that if a downstream tool doesn't support reading from BAM and really needs SAM you can always just use samtools view for piping.

ADD REPLY
2
Entering edit mode

I am planning to use htseq-count to generate reads per gene. Do U think there is another efficient way of doing this.

Thanks in Advance

ADD REPLY
6
Entering edit mode
9.7 years ago

Using make with option -j (num-parallel-tasks)

SAM_SOMEWHERE=$(shell find dir1/dir2 -name "*.sam")
%.bam:%.sam
    samtools view -S -b -o $@ $<     

.PHONY: all

all:$(SAM_SOMEWHERE:%.sam=%.bam) another1.bam another2.bam
    echo "Done: $^"
ADD COMMENT
0
Entering edit mode

Oops; I just realized that you want BAM->SAM. I wrote SAM->BAM. Change the pattern according to your needs :-)

ADD REPLY
5
Entering edit mode
9.7 years ago

Using GNU Parallel, if you have free processors

# replace the number is `j` with the number of free processors
parallel -j 5 'samtools view -h {} -o {}.sam' ::: *bam

This will do the conversion parallely on different processors assigned.

ADD COMMENT
1
Entering edit mode

I don't have clusters. I am trying to run it in just one machine.

ADD REPLY
2
Entering edit mode

Parallel can assign jobs to different cores/cpus on one machine.

ADD REPLY
1
Entering edit mode

With GNU parallel 20130222 this should use all CPUs on the local machine that are not busy:

parallel --load 100% samtools view -h {} -o {}.sam ::: *bam
ADD REPLY

Login before adding your answer.

Traffic: 2568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6