Question: Is it possible to extract the reference sequence from the BWA index files?
gravatar for Alex Richter
4.7 years ago by
Alex Richter210
San Diego
Alex Richter210 wrote:

I have a series of BWA index files ( *.amb *.ann *.bwt, etc) but the original reference fasta sequences are not available. Assuming that it's a complete burrows-wheeler index, the complete reference should be encoded in those indices.

Does anyone know if/how the fasta can be re-extracted from them?

bwa alignment • 2.2k views
ADD COMMENTlink modified 4.7 years ago by matted7.3k • written 4.7 years ago by Alex Richter210

see Reference File From Bam

ADD REPLYlink written 4.7 years ago by Pierre Lindenbaum131k
gravatar for matted
4.7 years ago by
Boston, United States
matted7.3k wrote:

This is an interesting question.  It definitely should be possible, so I guess the question may be how to do it with existing tools and the least amount of pain.

I looked into it, and it seems that from the original bwtsw code (that bwa relies on, particularly for the indexing steps), the "packed" fasta file (pac) is the key file to use.  This is supported by looking at the order of the indexing substeps of fa2pac, pac2bwt, and finally bwt2sa.

So as an informed guess I searched for "pac2fasta", and (surprisingly to me) found an existing utility in the TMAP package.  I believe, based on some quick tests, that the binary format for their pac file is the same as bwa uses (and bwtsw), aside from some embedded version numbers.  The only catch is that TMAP has a binary annotation file ($ref.tmap.anno) that stores the chromosome names and lengths, whereas bwa uses a plaintext annotation file ($ref.ann).  So if this is really what you want to do, it looks like you'll have to hardcode things (that's what I did to test with a small single chromosome reference) or write some code to make a TMAP annotation file from a bam header or bwa .ann file (by working through tmap_refseq_write_header in tmap_refseq.c).

ADD COMMENTlink written 4.7 years ago by matted7.3k

Thanks! Since I hopefully won't need to do this much, I'll do what you suggest, and munge in the annotations.

ADD REPLYlink written 4.7 years ago by Alex Richter210

Hi Alex, did you find a way to get this done? I also have the same problem. Thanks.

ADD REPLYlink written 4.5 years ago by xieshaojun0621170

Hi! Did any of you succeed? I would like your help in doing so either. Thanks!

ADD REPLYlink written 3.7 years ago by shiranos0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1154 users visited in the last hour