Question: Convert Bam File To Fasta File
4
gravatar for Sazzad
8.5 years ago by
Sazzad40
Sazzad40 wrote:

How can I convert bam file to fasta file using Biopython command? please share if some one have any idea.

Thanks in advance

sazzad

fasta bam biopython conversion • 24k views
ADD COMMENTlink modified 19 months ago by WouterDeCoster40k • written 8.5 years ago by Sazzad40

Hi Sazzad. Can you tell us a little more about your problem and why you are choosing to solve it using biopython?

ADD REPLYlink written 8.5 years ago by Casey Bergman18k

It will be easy if you clear what you did so far!

ADD REPLYlink written 8.5 years ago by Thaman3.2k
14
gravatar for Brad Chapman
8.5 years ago by
Brad Chapman9.4k
Boston, MA
Brad Chapman9.4k wrote:

You can do this with a combination of Biopython for writing the Fasta files and pysam for reading the BAM files:

import os
import sys

import pysam
from Bio import SeqIO, Seq, SeqRecord

def main(in_file):
    out_file = "%s.fa" % os.path.splitext(in_file)[0]
    with open(out_file, "w") as out_handle:
        # Write records from the BAM file one at a time to the output file.
        # Works lazily as BAM sequences are read so will handle large files.
        SeqIO.write(bam_to_rec(in_file), out_handle, "fasta")

def bam_to_rec(in_file):
    """Generator to convert BAM files into Biopython SeqRecords.
    """
    bam_file = pysam.Samfile(in_file, "rb")
    for read in bam_file:
        seq = Seq.Seq(read.seq)
        if read.is_reverse:
            seq = seq.reverse_complement()
        rec = SeqRecord.SeqRecord(seq, read.qname, "", "")
        yield rec

if __name__ == "__main__":
    main(*sys.argv[1:])

Run it with:

% python bam_to_fasta.py your_file.bam

and the output will be in your_file.fa

ADD COMMENTlink written 8.5 years ago by Brad Chapman9.4k

This works well, thanks Brad. However I found a couple of reasons to consider Picard SamToFastq suggested by Steve Lianoglou instead.

SamToFastq will properly split paired-end reads into two fastq files.

SamToFastq seems to be much faster (about 1/5th the time in my informal testing).

ADD REPLYlink written 8.4 years ago by Lance30

One more thing to note: SamToFastq actually outputs FastQ with quality scores, while this solution outputs Fasta files (no qualities). I actually modified Brad's solution to output FastQ for my testing, since that is what I needed.

ADD REPLYlink written 8.4 years ago by Lance30

Lance, that makes good sense. Picard's SamToFastq is great; I even wrote a Galaxy wrapper for it in the community tool shed: http://community.g2.bx.psu.edu/tool/browse_tools?sort=name&f-state=approved&id=91f644c6d57806f8&webapp=community&operation=view_tool&f-Category.name=SAM

ADD REPLYlink written 8.4 years ago by Brad Chapman9.4k

How to add sequencing quality and mapped or unmapped reads as parameters?

Thanks

ADD REPLYlink written 2.1 years ago by Dave1230
11
gravatar for Michael Schubert
8.5 years ago by
Cambridge, UK
Michael Schubert6.9k wrote:

Alternative approach using SAMtools (source: SeqAnswers)

samtools view filename.bam | \
awk '{OFS="\t"; print ">"$1"\n"$10}' - > filename.fasta
ADD COMMENTlink written 8.5 years ago by Michael Schubert6.9k
3

The problem with this is that the original strand of the sequence is lost, as SAM always reports sequences on the REF strand. To recreate the original sequence that was aligned, one needs to interrogate the FLAG field and ask if the read was actually aligned to the "-" strand. If so, it needs to be RC'ed. See line 6 of Brad's "bam_to_rec" function above.

ADD REPLYlink written 8.5 years ago by Aaronquinlan11k

As always, the usefulness of the approach depends on your intended use. But thanks for pointing that out, I didn't think of it.

ADD REPLYlink written 8.5 years ago by Michael Schubert6.9k

I created an account to up vote this!

thanks for the help.

ADD REPLYlink written 8.3 years ago by User 33250

In certain cases, re-generating fastq is possible with this small modification: samtools view filename.bam | awk '{OFS="\t"; print "@"$1"\n"$10"\n+\n"$11}' >| filename.fastq I have used this successfully with bam files generated by Tophat (on unstranded data).

ADD REPLYlink written 6.5 years ago by stu.andrews0
4
gravatar for Steve Lianoglou
8.5 years ago by
Steve Lianoglou5.0k
US
Steve Lianoglou5.0k wrote:

If you're willing to consider using something that's not Python-maden, Picard comes with the SamToFastq utility you can use from the command line that works on SAM and BAM files.

ADD COMMENTlink written 8.5 years ago by Steve Lianoglou5.0k
0
gravatar for Ben Lange
8.5 years ago by
Ben Lange190
Minneapolis, MN
Ben Lange190 wrote:

BAM is not in the list but OpenBabel seems like a pretty good converter application. Maybe someone can come up with a plugin if BAM is common enough.

ADD COMMENTlink written 8.5 years ago by Ben Lange190

but it is mostly designed to handle the chemical file formats.

ADD REPLYlink written 8.5 years ago by Pierre Lindenbaum122k

I'm learning so let's see if I understand:

*BAM is an animated graphics file format. http://iesdp.gibberlings3.net/file_formats/ie_formats/bam_v1.htm

*Fasta is a bioinformatics specific format. http://en.wikipedia.org/wiki/FASTA_format

Doesn't it make more sense to go from Fasta to BAM than the other way around? If you had a GIF file it would be pretty tough to reverse that file back to a language that described what was going on in the image. Unless there's quite a bit of metadata built into the BAM file.

ADD REPLYlink written 8.5 years ago by Ben Lange190

I'm learning about file formats. http://en.wikipedia.org/wiki/List_of_file_formats#Biology

BAM - is compressed SAM. SAM – Sequence Alignment/Map format, in which the results of the 1000 Genomes Project will be released. FASTA – The FASTA file format, for sequence data. Sometimes also given as FNA or FAA (Fasta Nucleic Acid or Fasta Amino Acid).

Both seem to be oriented towards sequencing data. Why would FASTA get lumped in with chemical data but not SAM?

ADD REPLYlink written 8.5 years ago by Ben Lange190
0
gravatar for WouterDeCoster
19 months ago by
Belgium
WouterDeCoster40k wrote:

And in more recent versions of samtools, you could just use samtools fasta

ADD COMMENTlink written 19 months ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 773 users visited in the last hour