BAM to FASTQ compression ratio
2
0
Entering edit mode
9.1 years ago
win ▴ 970

Hi all,

I downloaded the Denisova BAM file from here.

Since this file is aligned to the 1000 genomes reference genome and I would like to align it to grch38 I started the following utility for BAM to FASTQ conversion http://gsl.hudsonalpha.org/information/software/bam2fastq

After the file converted the FASTQ file is about 180GB whereas the BAM file is 80GB.

My question is does this seem correct and is there a way to estimate the size of FASTQ from a BAM file?

If there is faster converter please could you share, thanks in advance.

FASTQ BAM • 5.5k views
ADD COMMENT
0
Entering edit mode
9.1 years ago

BAM files are compressed, the fastq file that that outputs isn't, so yes, that's not unreasonable. Given the notice at the top of the bam2fastq page, I wouldn't be surprised if Picard is faster. It might also allow writing compressed files (no clue, I've never used its SamToFastq command).

ADD COMMENT
1
Entering edit mode

FYI: SamToFastq

(...)
COMPRESSION_LEVEL=Integer     Compression level for all compressed files created (e.g. BAM and GELI).  Default value:5.
ADD REPLY
0
Entering edit mode

You saved me from looking at the documentation :)

ADD REPLY
0
Entering edit mode
9.1 years ago
tszn1984 ▴ 100

use pipe '|' to avoid reading and writing huge files.

bam2fastx -q in.bam | bowtie -S /dev/stdin | samtools view -Sbh - >out.bam
ADD COMMENT
0
Entering edit mode

I edited your post to make it a bit more correct. Note that if paired-end reads are being used here that that won't work (and don't forget orphaned reads...).

ADD REPLY

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6