Question: Size different between two uBam file which generate from the same Fastq.
1
gravatar for weichi
2.0 years ago by
weichi10
Taiwan
weichi10 wrote:

I use GATK FastqToSam to generate sam file from Fastq. Then I use different tools to convert sam to bam. I found that the size of bam file are differet. But the number of reads between them are the same (Check by samtools flagstat).

  • Direct generate ubam.bam from gatk FastqToSam: 124MB
  • Convert sam to bam by gatk SamFormatConverter: 124MB
  • Convert sam to bam by samtools: 85MB

Did anyone found this different before? Why the file size are different? (different compression method?) Would this difference affect subsequent analysis?

samtools bam gatk • 639 views
ADD COMMENTlink modified 2.0 years ago by h.mon30k • written 2.0 years ago by weichi10

What are the exact commands you used? I would guess the difference is due to default compression level, but could be something else.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by h.mon30k
  • 124MB

gatk FastqToSam -F1 r1_fastq.gz -F2 r2_fastq.gz -O ubam.bam -SM test -PL ILLUMINA

  • 124MB

gatk FastqToSam -F1 r1_fastq.gz -F2 r2_fastq.gz -O ubam.sam -SM test -PL ILLUMINA

gatk SamFormatConverter -I ubam.sam -O ubam.bam

  • 85MB

gatk FastqToSam -F1 r1_fastq.gz -F2 r2_fastq.gz -O ubam.sam -SM test -PL ILLUMINA

samtools view -Sb ubam.sam > ubam.bam

I will check the manual to see whether it have different compression level or not, thanks.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by weichi10
2
gravatar for h.mon
2.0 years ago by
h.mon30k
Brazil
h.mon30k wrote:

Digging a bit, samtools uses the default compression level of the system at hand. In my system, for example, it is 6, on a scale from 1 (fast, but low compression) to 9 (slow, but high compression).

GATK uses by default 5, but there are some versions with a different default of 1.

Check the command-lines you used, and also the versions of the programs.

ADD COMMENTlink written 2.0 years ago by h.mon30k

Thanks for your help!

It's my first time to see the words 'compression level', I'll do some research about it .

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by weichi10
1
gravatar for Pierre Lindenbaum
2.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

Convert sam to bam by samtools: 85MB

if you're comparing sam and bam files, of course the binary+compressed bam file will be much more compact than the text file sam....

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by Pierre Lindenbaum129k

I means the size of three BAM file that generate by different tools are different. Not comparing SAM with BAM.

Or you means that samtools would convert SAM to BAM and also do further data compression, but gatk do not compress the file?

I'm not familiar with the data compression. I'll read the website you provided,thanks.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by weichi10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1367 users visited in the last hour