Hi,
I have 100's of BAM files in a directory. Is there a way with which we can convert all the BAM files in a directory into sam files. I know that we can use
samtools view -h <path/.bam> > <path/.sam>
to convert individual files.
Thanks
Hi,
I have 100's of BAM files in a directory. Is there a way with which we can convert all the BAM files in a directory into sam files. I know that we can use
samtools view -h <path/.bam> > <path/.sam>
to convert individual files.
Thanks
This is more of a [shell] scripting question. There are a lot of ways to do this, of varying elegance. Here's one try:
ls *bam | xargs -I {} -n 1 samtools view -h {} -o {}.sam
This will call this command on every bam
file in the directory:
samtools view -h in.bam -o in.bam.sam
Out of curiosity I have to ask why you want to do this... I can't imagine any good reasons to leave the binary format behind, unless you're trying to fill up your drives.
Using make with option -j (num-parallel-tasks)
SAM_SOMEWHERE=$(shell find dir1/dir2 -name "*.sam")
%.bam:%.sam
samtools view -S -b -o $@ $<
.PHONY: all
all:$(SAM_SOMEWHERE:%.sam=%.bam) another1.bam another2.bam
echo "Done: $^"
Using GNU Parallel, if you have free processors
# replace the number is `j` with the number of free processors
parallel -j 5 'samtools view -h {} -o {}.sam' ::: *bam
This will do the conversion parallely on different processors assigned.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I am planning to use htseq-count to generate reads per gene. Do U think there is another efficient way of doing this.
Thanks in Advance
There are a lot of tools that do this; you can search this forum for others. Extracting Read Count For Each Gene/Exon From Rna-Seq Bam Files discussion a few options, including HTseq. It also discusses
bedtools multicov
, which might be better for you since it works on BAMs natively and can handle multiple at once. If you want to use HTseq, since it looks like it doesn't support BAMs directly, I would pipe things in instead of writing to disk:Sounds Good and Thank You very much for the suggestion
Hi matted,
An extension to the original question, how would one pipe the data if we need to sort the bam file first then use htseq-count. For example I tried
samtools view -h in.bam | samtools sort /dev/stdin /dev/stdout | htseq-count -m intersection-strict -s no /dev/stdin /eatures.gtf > test.count
which does not seem to work..and gives the error [sort_blocks] fail to create file /dev/stdout.bam.
Any ideas would be much appreciated! many thanks
you need to teel "view" to output BAM (option -bu ) , not SAM. "sort" needs a file suffix , not a stream/device ( /dev/stdout), replace /dev/stdin by '-'.
Exactly. Also note that if a downstream tool doesn't support reading from BAM and really needs SAM you can always just use samtools view for piping.
I am planning to use htseq-count to generate reads per gene. Do U think there is another efficient way of doing this.
Thanks in Advance