Question: BAM File size increased after extracting unique reads
0
gravatar for ilovesuperheroes1993
15 days ago by
ilovesuperheroes19930 wrote:

Hi, I had used STAR aligner for mapping my reads, and the output BAM files were sorted by coordinate. I used the follwing command to extract unique reads from my bam files:

samtools view -q 255 input_file.bam > unique_reads.bam

(SAM Flag 255 corresponds to unique alignments in STAR)

However, the sizes of my new bam files have increased several-fold. (For example a bam file that was originally 500 mb-900 mb have now become 2.5 gb) This has happened for all the samples.

When I am checking the number of lines in the bam files (the old one and the ones containing the unique reads), it shows that the old file (of size say 500 mb has 44 million lines) while the new file (say size 2 gb has 17 million lines). The number of lines are as expected.

I have checked in the header of both the bam files that both are sorted by coordinate.

So, could anyone tell me why the size of the file containing the lesser number of lines should be so much larger?

ADD COMMENTlink modified 14 days ago by michael.ante3.3k • written 15 days ago by ilovesuperheroes19930
3
gravatar for michael.ante
14 days ago by
michael.ante3.3k
Austria/Vienna
michael.ante3.3k wrote:

Hi,

Without the -b option, you'll get a SAM file which is not compressed. Adding -b and -h to your command, will produce a valid and compressed BAM file.

Best,

Michael

ADD COMMENTlink written 14 days ago by michael.ante3.3k

Agreed. Still in the most recent samtools versions you would not even need to set any flags as it recognizes file format based on the suffix if you use -o instead of redirecting stdout like samtools view -q 255 -o unique_reads.bam input_file.bam. WIth your current command you produced a SAM instead of BAM file without a header as -h was missing. When using -b then -h is implied.

ADD REPLYlink written 14 days ago by ATpoint21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1677 users visited in the last hour