I found 2 posts recommending to use HTSlib htscmd for converting BAM to Fastq, but both output only interleaved file paired reads. I used this tool to deinterleave http://code.google.com/p/popgentools/source/browse/trunk/misc/split-interleaved-fastq.pl
After deinterleaving the paired.fq from this command: htscmd bamshuf -Ou input.bam tmp-prefix | htscmd bam2fq -s se.fq.gz - | gzip > pe.fq.gz A: Samtofastq: Net.Sf.Picard.Picardexception: "Found N Unpaired Mates " I got only 41k pairs.
After deinterleaving output of this command htscmd bamshuf -uOn 128 aln_reads.bam tmp | htscmd bam2fq -a - | gzip > interleaved_reads.fq.gz http://gatkforums.broadinstitute.org/discussion/2908/howto-revert-a-bam-file-to-fastq-format I got 232.7k reads for one end and 214.4k reads for the other end.
Is there a parameter in HTSlib htscmd which could instruct to output dedeinterleaved reads? Or is there a more fitting tool for deinterleaving?
My Bam file is from Tophat, and I would like to re-analyze these reads after filtering again with Tophat. Is it important to integrate them back with paired reads for re-analysis? It appears from here http://www.arrayserver.com/wiki/index.php?title=FPKM_Transcript that singletons are not passed on to Cufflinks by Tophat for FPKM, but since they mapped, I would think that the Tophat/Cufflinks pipeline would make use of them? Are singletons tend to be splice-junction reads?