getting a consense one header .fasta file from a genic region, from .bam template
1
2
Entering edit mode
5.0 years ago
ricfoz ▴ 80

I started with a 9GB file of a human Xth chromosome with a .bam format, and i am trying to get a specific genic region in .fasta clasic format (you know, a one header file, starting with ">", with description on header, and a single line, of continuous tandem single nucleotides).

I have been able to retrieve the reads of the genic region i want in bam format. I share the path i have followed:

samtools sort OriginalFile.bam -o OriginalFile.sort.bam samtools inex OriginalFile.sort.bam samtools view -b OriginalFile.sort.bam RegName:XX-XX+1 > GeneName.bam

after this i got the GeneName.bam file which contains all the reads on which i am interested in, but i canĀ“t run the phylogenetic tool i am trying to pipe the GeneName.bam sequence to ... in order to follow my workpath, i need to transform this GeneName.bam file, into this classic .fasta format i described lines ago.

I tried a couple of pipelines in order to achieve this, but what i have got as a retrieved .fasta file, is a man readable file, but with all the single fastq reads, with headers and everything, stacked on top of each other. I need to get the consensus of this, with one header and only contiguous nucleotides.

I got GeneName.fasta file running:

samtools bam2fq GeneName.bam > GeneName.fasta

or

samtools bam2fq GeneName.bam > GeneName.fastq samtools seqtk seq -A GeneName.fastq > GeneName.fasta # (i installed bowtie2, boost, tophat2, seqtk, and bcf tools, as they seem to complement each other's works at some times)

and i got the same result on both, analogous file.

With this line of thinking, i thought i may have to run mpileup to the GeneName.bam file, before transforming to fasta, but mpileup gives as a result .bcf format... still, this last assumption is just a form of discussing my problem.

Have anyone out there found the solution for this, can a .bam file converted to classic .fasta format, or can help in any way?

Greetings

.bam .fasta .one header consense .bam > .fasta • 2.4k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Yes, that post is mine... i have posted a couple of times on the matter as i have been working through the problem ... now i am at the point at which i can retrieve the genicregion.fasta file, but it gives only single fastq reads piled up upon each other... i want a .fasta format, which is completely different from .fastq single read.

ADD REPLY
0
Entering edit mode

now that you got fastq file, try converting to fasta file using tools such as: https://bioconda.github.io/recipes/ucsc-fastqtofa/README.html. This would give you each read as single fasta record. You can download the binaries from http://hgdownload.cse.ucsc.edu/admin/exe/ and select as per your platform.

ADD REPLY
0
Entering edit mode

That's not what OP needs, rather a de novo assembly. OP wants a "reference" fasta for that region if I understood correctly, or create a consensus sequence of the bam file.

ADD REPLY
3
Entering edit mode
5.0 years ago
ricfoz ▴ 80

Alright, i solved my problem, so i share what i found to the community so you can solve similar problems

after retrieving a .bam file with the desired length, through the command "samtools view -b ChrName:XX-XX+1 > Outputname.bam"

you need to pipe this output into samtools mpileup command, in order to get a .bcf file

"samtools mpileup -g -f refgenome.fa filename.bam > filename.bcf" (check samtools tutorials to change or check/confirm desired flag options)

then you need to transform .bcf to .vcf callig: "bcftools call -m filename.bcf > filename .vcf"

this last .vcf file needs to be compressed and indexed like following:

bgzip -c filename.vcf > filename.vcf.gz

tabix filename.vcf.gz

after having succesfully run these commands, you can get to the final step, using samtools again:

samtools faidx refgenome.fa ChrName:XX-XX+1 | bcftools consensus filename.vcf.gz > filename.fa

I hope these pipeline is useful for you people, and i thank all the community, and people who aided on the go. Greetings!

ADD COMMENT
1
Entering edit mode

Good job and thanks for sharing, I have moved this to an answer because it solves your question, making it also easier to find for others.

ADD REPLY
0
Entering edit mode

No problem, i guess the aim of the community is to get aid on problems with actually solved answers.

Cheers.

ADD REPLY

Login before adding your answer.

Traffic: 1406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6