Question: Multiple Bam To Multiple Sam
4
gravatar for venkateshr89
6.6 years ago by
venkateshr89690
United States
venkateshr89690 wrote:

Hi,

I have 100's of BAM files in a directory. Is there a way with which we can convert all the BAM files in a directory into Sam files. I know that we can use

samtools view -h <path/.bam> > <path/.sam>

to convert individual files.

Thanks

ADD COMMENTlink modified 6.6 years ago by matted7.2k • written 6.6 years ago by venkateshr89690
7
gravatar for matted
6.6 years ago by
matted7.2k
Boston, United States
matted7.2k wrote:

This is more of a [shell] scripting question. There are a lot of ways to do this, of varying elegance. Here's one try:

ls *bam | xargs -I {} -n 1 samtools view -h {} -o {}.sam

This will call this command on every bam file in the directory:

samtools view -h in.bam -o in.bam.sam

Out of curiosity I have to ask why you want to do this... I can't imagine any good reasons to leave the binary format behind, unless you're trying to fill up your drives.

ADD COMMENTlink written 6.6 years ago by matted7.2k
1

I am planning to use htseq-count to generate reads per gene. Do U think there is another efficient way of doing this.

Thanks in Advance

ADD REPLYlink written 6.6 years ago by venkateshr89690
1

There are a lot of tools that do this; you can search this forum for others. Extracting Read Count For Each Gene/Exon From Rna-Seq Bam Files discussion a few options, including HTseq. It also discusses bedtools multicov, which might be better for you since it works on BAMs natively and can handle multiple at once. If you want to use HTseq, since it looks like it doesn't support BAMs directly, I would pipe things in instead of writing to disk:

samtools view -h in.bam | htseq-count /dev/stdin features.gff
ADD REPLYlink written 6.6 years ago by matted7.2k
1

Sounds Good and Thank You very much for the suggestion

ADD REPLYlink written 6.6 years ago by venkateshr89690

Hi matted,

An extension to the original question, how would one pipe the data if we need to sort the bam file first then use htseq-count. For example I tried

samtools view -h in.bam | samtools sort /dev/stdin /dev/stdout | htseq-count -m intersection-strict -s no /dev/stdin /eatures.gtf > test.count

which does not seem to work..and gives the error [sort_blocks] fail to create file /dev/stdout.bam.

Any ideas would be much appreciated! many thanks

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Zack0

you need to teel "view" to output BAM (option -bu ) , not SAM. "sort" needs a file suffix , not a stream/device ( /dev/stdout), replace /dev/stdin by '-'.

ADD REPLYlink written 6.6 years ago by Pierre Lindenbaum124k

Exactly. Also note that if a downstream tool doesn't support reading from BAM and really needs SAM you can always just use samtools view for piping.

ADD REPLYlink written 6.6 years ago by Andreas2.4k
2

I am planning to use htseq-count to generate reads per gene. Do U think there is another efficient way of doing this.

Thanks in Advance

ADD REPLYlink written 6.6 years ago by venkateshr89690
6
gravatar for Pierre Lindenbaum
6.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

Using make with option -j (num-parallel-tasks)

SAM_SOMEWHERE=$(shell find dir1/dir2 -name "*.sam")
%.bam:%.sam
    samtools view -S -b -o $@ $<     

.PHONY: all

all:$(SAM_SOMEWHERE:%.sam=%.bam) another1.bam another2.bam
    echo "Done: $^"
ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Pierre Lindenbaum124k

Oops; I just realized that you want BAM->SAM. I wrote SAM->BAM. Change the pattern according to your needs :-)

ADD REPLYlink modified 6 weeks ago by RamRS24k • written 6.6 years ago by Pierre Lindenbaum124k
5
gravatar for Sukhdeep Singh
6.6 years ago by
Sukhdeep Singh9.9k
Netherlands
Sukhdeep Singh9.9k wrote:

Using GNU Parallel, if you have free processors

# replace the number is `j` with the number of free processors
parallel -j 5 'samtools view -h {} -o {}.sam' ::: *bam

This will do the conversion parallely on different processors assigned.

ADD COMMENTlink modified 6.6 years ago • written 6.6 years ago by Sukhdeep Singh9.9k
1

I don't have clusters. I am trying to run it in just one machine.

ADD REPLYlink written 6.6 years ago by venkateshr89690
1

Parallel can assign jobs to different cores/cpus on one machine.

ADD REPLYlink written 6.6 years ago by Zhen Sun50
1

With GNU parallel 20130222 this should use all CPUs on the local machine that are not busy:

parallel --load 100% samtools view -h {} -o {}.sam ::: *bam
ADD REPLYlink modified 6 weeks ago by RamRS24k • written 6.6 years ago by ole.tange3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2109 users visited in the last hour