I started with a 9GB file of a human Xth chromosome with a .bam format, and i am trying to get a specific genic region in .fasta clasic format (you know, a one header file, starting with ">", with description on header, and a single line, of continuous tandem single nucleotides).
I have been able to retrieve the reads of the genic region i want in bam format. I share the path i have followed:
samtools sort OriginalFile.bam -o OriginalFile.sort.bam samtools inex OriginalFile.sort.bam samtools view -b OriginalFile.sort.bam RegName:XX-XX+1 > GeneName.bam
after this i got the GeneName.bam file which contains all the reads on which i am interested in, but i can´t run the phylogenetic tool i am trying to pipe the GeneName.bam sequence to ... in order to follow my workpath, i need to transform this GeneName.bam file, into this classic .fasta format i described lines ago.
I tried a couple of pipelines in order to achieve this, but what i have got as a retrieved .fasta file, is a man readable file, but with all the single fastq reads, with headers and everything, stacked on top of each other. I need to get the consensus of this, with one header and only contiguous nucleotides.
I got GeneName.fasta file running:
samtools bam2fq GeneName.bam > GeneName.fasta
samtools bam2fq GeneName.bam > GeneName.fastq samtools seqtk seq -A GeneName.fastq > GeneName.fasta # (i installed bowtie2, boost, tophat2, seqtk, and bcf tools, as they seem to complement each other's works at some times)
and i got the same result on both, analogous file.
With this line of thinking, i thought i may have to run mpileup to the GeneName.bam file, before transforming to fasta, but mpileup gives as a result .bcf format... still, this last assumption is just a form of discussing my problem.
Have anyone out there found the solution for this, can a .bam file converted to classic .fasta format, or can help in any way?