Hi all, We were using the samtools calmd –e command and changed all the basses that are identical to the reference to '='. Anyone knows some software that converts such compressed BAM back to FASTQ? One-liners won't help in this case. Thanks
If your BAM is coordinate sorted, I would not recommend to use Picard directly, as this will generate a coordiante sorted file, which may cause mapping problems (certainly for bwa) and other potential biases in analyses.
Based on other comments I think these are valid methods:
INPUT_BAM=input.bam PICARD="/path/to/picard/jars" JAVA_OPTS="-Xmx4G" PICARD_OPTS="MAX_RECORDS_IN_RAM=5000000 TMP_DIR=/path/to/local/tmp" htscmd bamshuf -O -u $INPUT_BAM _tmp | \ java $JAVA_OPTS -jar $PICARD/SamToFastq.jar I=/dev/stdin F=e1.fq F2=e2.fq $PICARD_OPTS
Alternatively we could use picard for the sorting (I haven't tried this method):
I_BAM=input.bam PICARD="/path/to/picard/jars" JAVA_OPTS="-Xmx4G" PI_OPTS="MAX_RECORDS_IN_RAM=5000000 TMP_DIR=/path/to/local/tmp" java $JAVA_OPTS -jar $PICARD/SortSam.jar I=$I_BAM O=/dev/stdout SORT_ORDER=queryname $PI_OPTS |\ java $JAVA_OPTS -jar $PICARD/SamToFastq.jar I=/dev/stdin F=e1.fq F2=e2.fq $PI_OPTS
I checked out the recommendation to use picard or Hydra to convert BAM to Fastq. Works great UNLESS YOU USED THE -calmd -e OPTION OF SAMTOOLS. In this case, picard complaints that it sees an illegitimate base '=', and hydra happily creates fastq sequences made almost entirely of '=' charaters :(