Size different between two uBam file which generate from the same Fastq.
2
1
Entering edit mode
5.8 years ago
weichi ▴ 10

I use GATK FastqToSam to generate sam file from Fastq. Then I use different tools to convert sam to bam. I found that the size of bam file are differet. But the number of reads between them are the same (Check by samtools flagstat).

  • Direct generate ubam.bam from gatk FastqToSam: 124MB
  • Convert sam to bam by gatk SamFormatConverter: 124MB
  • Convert sam to bam by samtools: 85MB

Did anyone found this different before? Why the file size are different? (different compression method?) Would this difference affect subsequent analysis?

samtools gatk bam • 1.7k views
ADD COMMENT
0
Entering edit mode

What are the exact commands you used? I would guess the difference is due to default compression level, but could be something else.

ADD REPLY
0
Entering edit mode
  • 124MB

gatk FastqToSam -F1 r1_fastq.gz -F2 r2_fastq.gz -O ubam.bam -SM test -PL ILLUMINA

  • 124MB

gatk FastqToSam -F1 r1_fastq.gz -F2 r2_fastq.gz -O ubam.sam -SM test -PL ILLUMINA

gatk SamFormatConverter -I ubam.sam -O ubam.bam

  • 85MB

gatk FastqToSam -F1 r1_fastq.gz -F2 r2_fastq.gz -O ubam.sam -SM test -PL ILLUMINA

samtools view -Sb ubam.sam > ubam.bam

I will check the manual to see whether it have different compression level or not, thanks.

ADD REPLY
2
Entering edit mode
5.8 years ago
h.mon 35k

Digging a bit, samtools uses the default compression level of the system at hand. In my system, for example, it is 6, on a scale from 1 (fast, but low compression) to 9 (slow, but high compression).

GATK uses by default 5, but there are some versions with a different default of 1.

Check the command-lines you used, and also the versions of the programs.

ADD COMMENT
0
Entering edit mode

Thanks for your help!

It's my first time to see the words 'compression level', I'll do some research about it .

ADD REPLY
1
Entering edit mode
5.8 years ago

Convert sam to bam by samtools: 85MB

if you're comparing sam and bam files, of course the binary+compressed bam file will be much more compact than the text file sam....

ADD COMMENT
0
Entering edit mode

I means the size of three BAM file that generate by different tools are different. Not comparing SAM with BAM.

Or you means that samtools would convert SAM to BAM and also do further data compression, but gatk do not compress the file?

I'm not familiar with the data compression. I'll read the website you provided,thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6