I started with a 9GB file of a human Xth chromosome with a .bam format, and I am trying to get a specific genic region in .fasta classic format (you know, a one header file, starting with ">", with description on header, and a single line, of continuous tandem single nucleotides).
I have been able to retrieve the reads of the genic region I want in bam format. I share the path I have followed:
samtools sort OriginalFile.bam -o OriginalFile.sort.bam
samtools inex OriginalFile.sort.bam
samtools view -b OriginalFile.sort.bam RegName:XX-XX+1 > GeneName.bam
after this I got the GeneName.bam file which contains all the reads on which I am interested in, but I can't run the phylogenetic tool I am trying to pipe the GeneName.bam sequence to ... in order to follow my workpath, I need to transform this GeneName.bam file, into this classic .fasta format I described lines ago.
I tried a couple of pipelines in order to achieve this, but what I have got as a retrieved .fasta file, is a man readable file, but with all the single fastq reads, with headers and everything, stacked on top of each other. I need to get the consensus of this, with one header and only contiguous nucleotides.
I got GeneName.fasta file running:
samtools bam2fq GeneName.bam > GeneName.fasta
samtools bam2fq GeneName.bam > GeneName.fastq
samtools seqtk seq -A GeneName.fastq > GeneName.fasta # (i installed bowtie2, boost, tophat2, seqtk, and bcf tools, as they seem to complement each other's works at some times)
and I got the same result on both, analogous file.
With this line of thinking, I thought I may have to run mpileup to the GeneName.bam file, before transforming to fasta, but mpileup gives as a result .bcf format... still, this last assumption is just a form of discussing my problem.
Have anyone out there found the solution for this, can a .bam file converted to classic .fasta format, or can help in any way?