Question: From Compressed Bam To Fastq
gravatar for Zhidkov
8.8 years ago by
Zhidkov570 wrote:

Hi all, We were using the samtools calmd –e command and changed all the basses that are identical to the reference to '='. Anyone knows some software that converts such compressed BAM back to FASTQ? One-liners won't help in this case. Thanks

samtools • 5.2k views
ADD COMMENTlink written 8.8 years ago by Zhidkov570

Thank you all for answers, The approach that was eventually taken is using pileup file to generate uncompressed BAM and then it is trivial BAM2FASTQ script.


ADD REPLYlink written 8.8 years ago by Zhidkov570
gravatar for lh3
8.8 years ago by
United States
lh332k wrote:

If your BAM is coordinate sorted, I would not recommend to use Picard directly, as this will generate a coordiante sorted file, which may cause mapping problems (certainly for bwa) and other potential biases in analyses.

The recommended way is to sort your BAM by name and then convert to fastq with Picard or your own scripts -- for name sorted BAM, it very easy to generate fastq.

ADD COMMENTlink written 8.8 years ago by lh332k

Heng, this is useful information that I think is underappreciated. Can you expand on the nature of mapping problems that this will cause?

ADD REPLYlink written 7.1 years ago by nlomioni30

Reads from the same genomic regions share similar properties. For bwa, chromosomal ordered reads will interfere with insert size estimate. For all mappers, such ordered reads will make each alignment jobs run at different speeds, faster if reads are from a unique region, but slower if from a repetitive region. Now htslib comes with bamshuf, which is much faster than sorting. Running bamshuf before bam2fq is the preferred solution.

ADD REPLYlink written 7.1 years ago by lh332k

@nlomioni It is certainly underappreciated. Thanks for the info @lh3, I still don't understand the issues in the IS calculations and the variability in running times. The purpose of the IS calculations is to find the real IS of the library, how reads from the same chrm regions will interfere here? As for the running time, I understand repeat regions will be more computationally intensive to map but the overal running time of the whole mapping process should be the same for the different methods?

ADD REPLYlink written 7.0 years ago by Drio910

Speed matters when you split alignment jobs on different computing nodes. For paired-end, insert size distribution will be different in some regions especially in chr1 centromere, where there are many unpaired/mispaired reads. This will lead to potential artifacts. Bamshuf is much faster and more lightweight than picard without having these concerns. It is the best solution.

ADD REPLYlink written 7.0 years ago by lh332k
gravatar for Drio
8.8 years ago by
United States
Drio910 wrote:

Take a look to this thread. Basically there is a picard command for that.

Based on other comments I think these are valid methods:

PICARD_OPTS="MAX_RECORDS_IN_RAM=5000000 TMP_DIR=/path/to/local/tmp"
htscmd bamshuf -O -u $INPUT_BAM _tmp | \
    java $JAVA_OPTS -jar $PICARD/SamToFastq.jar I=/dev/stdin F=e1.fq F2=e2.fq $PICARD_OPTS

Alternatively we could use picard for the sorting (I haven't tried this method):

PI_OPTS="MAX_RECORDS_IN_RAM=5000000 TMP_DIR=/path/to/local/tmp" 
java $JAVA_OPTS -jar $PICARD/SortSam.jar I=$I_BAM O=/dev/stdout SORT_ORDER=queryname $PI_OPTS |\
    java $JAVA_OPTS -jar $PICARD/SamToFastq.jar I=/dev/stdin F=e1.fq F2=e2.fq $PI_OPTS
ADD COMMENTlink modified 6 months ago by RamRS26k • written 8.8 years ago by Drio910
gravatar for Eitan Rubin
8.8 years ago by
Eitan Rubin30
Eitan Rubin30 wrote:

I checked out the recommendation to use picard or Hydra to convert BAM to Fastq. Works great UNLESS YOU USED THE -calmd -e OPTION OF SAMTOOLS. In this case, picard complaints that it sees an illegitimate base '=', and hydra happily creates fastq sequences made almost entirely of '=' charaters :(

ADD COMMENTlink written 8.8 years ago by Eitan Rubin30
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 805 users visited in the last hour