SAM tools pileup produces a consensus sequence in vertical format, which is not terribly useful for traditional sequence analysis. Does anyone know of a command line utility to convert SAM tools pileup format to a simple fasta sequence?
I am aware that SAM tools provides tool to convert pileup to fastq samtools.pl pileup2fq), but fastq to fasta conversion presents its own challenges:
http://stackoverflow.com/questions/1542306/converting-fastq-to-fasta-with-sed-awk
http://ukpmc.ac.uk/articlerender.cgi?accid=PMC2847217
Of course I could write a simple script for this task, but I'm hoping someone has already solved this problem for the community. Apologies in advance if this functionality is available in SAM tools itself.
Many thanks, Casey
you should check for the memory leaks, if the stream was open, etc...
There is no memory leak. As to file opening, if the file is not readable, an immediate segfault; no false results.
Heng, thanks for this. Confirmed that it works on multi-line fastq as advertised. Would it be possible to embed this as an output option in samtools?
What if I didn't want the bases with quality lower than to 20 to be converted to lower cases?
I used the command:
However it still outputted many bases in lower case!
Can you kindly tell me how to avoid this!
Thank you