Is it possible to extract the reference sequence from the BWA index files?
1
3
Entering edit mode
7.3 years ago
Alex Richter ▴ 210

I have a series of BWA index files ( *.amb *.ann *.bwt, etc) but the original reference fasta sequences are not available. Assuming that it's a complete burrows-wheeler index, the complete reference should be encoded in those indices.

Does anyone know if/how the fasta can be re-extracted from them?

alignment bwa • 3.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
6
Entering edit mode
7.3 years ago
matted 7.7k

This is an interesting question. It definitely should be possible, so I guess the question may be how to do it with existing tools and the least amount of pain.

I looked into it, and it seems that from the original bwtsw code (that bwa relies on, particularly for the indexing steps), the "packed" fasta file (pac) is the key file to use. This is supported by looking at the order of the indexing substeps of fa2pac, pac2bwt, and finally bwt2sa.

So as an informed guess I searched for "pac2fasta", and (surprisingly to me) found an existing utility in the TMAP package. I believe, based on some quick tests, that the binary format for their pac file is the same as bwa uses (and bwtsw), aside from some embedded version numbers. The only catch is that TMAP has a binary annotation file ($ref.tmap.anno) that stores the chromosome names and lengths, whereas bwa uses a plaintext annotation file ($ref.ann). So if this is really what you want to do, it looks like you'll have to hardcode things (that's what I did to test with a small single chromosome reference) or write some code to make a TMAP annotation file from a bam header or bwa .ann file (by working through tmap_refseq_write_header in tmap_refseq.c).

ADD COMMENT
0
Entering edit mode

Thanks! Since I hopefully won't need to do this much, I'll do what you suggest, and munge in the annotations.

ADD REPLY
0
Entering edit mode

Hi Alex, did you find a way to get this done? I also have the same problem. Thanks.

ADD REPLY
0
Entering edit mode

Hi! Did any of you succeed? I would like your help in doing so either. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2250 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6