Question: getting a consense one header .fasta file from a genic region, from .bam template
0
gravatar for ricfoz
16 months ago by
ricfoz20
National School of Antropology and History, Mexico city, Mexico
ricfoz20 wrote:

I started with a 9GB file of a human Xth chromosome with a .bam format, and i am trying to get a specific genic region in .fasta clasic format (you know, a one header file, starting with ">", with description on header, and a single line, of continuous tandem single nucleotides).

I have been able to retrieve the reads of the genic region i want in bam format. I share the path i have followed:

samtools sort OriginalFile.bam -o OriginalFile.sort.bam samtools inex OriginalFile.sort.bam samtools view -b OriginalFile.sort.bam RegName:XX-XX+1 > GeneName.bam

after this i got the GeneName.bam file which contains all the reads on which i am interested in, but i canĀ“t run the phylogenetic tool i am trying to pipe the GeneName.bam sequence to ... in order to follow my workpath, i need to transform this GeneName.bam file, into this classic .fasta format i described lines ago.

I tried a couple of pipelines in order to achieve this, but what i have got as a retrieved .fasta file, is a man readable file, but with all the single fastq reads, with headers and everything, stacked on top of each other. I need to get the consensus of this, with one header and only contiguous nucleotides.

I got GeneName.fasta file running:

samtools bam2fq GeneName.bam > GeneName.fasta

or

samtools bam2fq GeneName.bam > GeneName.fastq samtools seqtk seq -A GeneName.fastq > GeneName.fasta # (i installed bowtie2, boost, tophat2, seqtk, and bcf tools, as they seem to complement each other's works at some times)

and i got the same result on both, analogous file.

With this line of thinking, i thought i may have to run mpileup to the GeneName.bam file, before transforming to fasta, but mpileup gives as a result .bcf format... still, this last assumption is just a form of discussing my problem.

Have anyone out there found the solution for this, can a .bam file converted to classic .fasta format, or can help in any way?

Greetings

ADD COMMENTlink modified 16 months ago • written 16 months ago by ricfoz20

See also Converting .bam file to .fasta file from a genic region

ADD REPLYlink written 16 months ago by WouterDeCoster37k

Yes, that post is mine... i have posted a couple of times on the matter as i have been working through the problem ... now i am at the point at which i can retrieve the genicregion.fasta file, but it gives only single fastq reads piled up upon each other... i want a .fasta format, which is completely different from .fastq single read.

ADD REPLYlink written 16 months ago by ricfoz20

now that you got fastq file, try converting to fasta file using tools such as: https://bioconda.github.io/recipes/ucsc-fastqtofa/README.html. This would give you each read as single fasta record. You can download the binaries from http://hgdownload.cse.ucsc.edu/admin/exe/ and select as per your platform.

ADD REPLYlink written 16 months ago by cpad011211k

That's not what OP needs, rather a de novo assembly. OP wants a "reference" fasta for that region if I understood correctly, or create a consensus sequence of the bam file.

ADD REPLYlink modified 16 months ago • written 16 months ago by WouterDeCoster37k
1
gravatar for ricfoz
16 months ago by
ricfoz20
National School of Antropology and History, Mexico city, Mexico
ricfoz20 wrote:

Alright, i solved my problem, so i share what i found to the community so you can solve similar problems

after retrieving a .bam file with the desired length, through the command "samtools view -b ChrName:XX-XX+1 > Outputname.bam"

you need to pipe this output into samtools mpileup command, in order to get a .bcf file

"samtools mpileup -g -f refgenome.fa filename.bam > filename.bcf" (check samtools tutorials to change or check/confirm desired flag options)

then you need to transform .bcf to .vcf callig: "bcftools call -m filename.bcf > filename .vcf"

this last .vcf file needs to be compressed and indexed like following:

bgzip -c filename.vcf > filename.vcf.gz

tabix filename.vcf.gz

after having succesfully run these commands, you can get to the final step, using samtools again:

samtools faidx refgenome.fa ChrName:XX-XX+1 | bcftools consensus filename.vcf.gz > filename.fa

I hope these pipeline is useful for you people, and i thank all the community, and people who aided on the go. Greetings!

ADD COMMENTlink written 16 months ago by ricfoz20
1

Good job and thanks for sharing, I have moved this to an answer because it solves your question, making it also easier to find for others.

ADD REPLYlink written 16 months ago by WouterDeCoster37k

No problem, i guess the aim of the community is to get aid on problems with actually solved answers.

Cheers.

ADD REPLYlink written 15 months ago by ricfoz20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2329 users visited in the last hour