Reduce Sam/Bam File Size
1
0
Entering edit mode
12.1 years ago

Hi,

I've a little question on sam and bam file sizes.

When I use bwa on paired-end reads (~50M reads) on a small reference sequence (~100 kb) , I've a bam file of about 5 Go . After looking the alignment, only a few reads aligned on this reference (~500 reads max)

But When I use tophat with the same input and the same reference, the output bam has a size of only 10 kb and the number of aligned reads is the same...

So is it a way to reduce my bam file ?

Thanks

bam sam • 6.3k views
ADD COMMENT
3
Entering edit mode
12.1 years ago

Just guessing, but your first BAM file probably contains both aligned and unaligned reads. The Tophat-produced BAM file contains only aligned reads. Both are "correct" BAM files, but which is most useful will depend on your particular needs. If you decide that you do not need to have the unaligned reads, you can use samtools view with a flag filter to remove reads that are unmapped.

ADD COMMENT
0
Entering edit mode

Is it ok like this : samtools view -F 4 in.bam > out.bam

ADD REPLY
0
Entering edit mode

You'll probably also want to include -b and -h for bam output and SAM header, respectively.

ADD REPLY

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6